1/2 


AD-fll(3  837  MODIFIED  KOLHOQOROV-SHIRNOV  ANDERSON-DARLING  RHO 

CRAHER-VON  RISES  TESTS  F.  .  <U>  RIR  FORCE  INST  OF  TECH 
MRIGHT-PflTTERSON  RFB  OH  SCHOOL  OF  ENGI. .  J  E  PORTER 
UNCLASSIFIED  DEC  85  AFIT/GS0/HA/83D-6  F/G  12/1 


AD- A 163  837 


V. 


DTIC 

j:lecte 

feb  1  o  l 


V'- 


D 


D 


t aK^s&jurv  «ra  t 


!* 

is 


MODIFIED  KOLMOGOROV -GM I RNCV , 
ANDERSON-DARLING,  AND  CRAMER— VON  MISES 
TESTS  FOR  THE  PARETO  DISTRIBUTION  WITH 
UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 


THESIS 


A^£.-~ur:jElc^T 


' LIT Ti f E DTI O N  Sr/y’LMntTjL 
Apple-’  ■'<>  ioi  pu'.iic  r^lcc  ;  jj 

DwtrO/utun  Ui-'imitod  _ _ 


JarnGG  E.  Portar  III 
Captain,  USAF 

AFIT/GS0/MA/85D-6 


department  or  thf  air  i-orcl 

A  !!<  UNIVPRS1T/ 

AiR  FORCE  INSTITUTE  OF  TfcCHNOLOG’ 


AFIT/GSO/MA/S5D— 6 


SPT/C 

^LECTEi 

FEB  1  0  1986 j 

JD  * 


»  -  .  •  ^ 
* «.  ■  •  • , 

s/aw, 

“•  »*•  /*  i 

;.>V 

.-.‘.V-vl 


k  A 

*T"  S *  V 


KM 


me 


MODIFIED  KOLMOGOROV-SMIRNOV, 
ANDERSON-DARLING,  AND  CRAMER -VON  MISES 
TESTS  FOR  THE  PARETO  DISTRIBUTION  WITH 
UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 

THESIS 

James  E.  Porter  III 
Captain,  USAF 

AF IT/ GSO / MA / 85D-6 


L-j 


L-3 


w‘- 


Approved  for  public  release;  distribution  unlimited 


AFIT/GS0/MA/85D-6 


MODIFIED  KOLMOGOROV-SM I RNOV , 
ANDERSON-DARLING,  AND  CRAMER- VON  MISES  TESTS 
FOR  THE  PARETO  DISTRIBUTION 
WITH  UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 

THESIS 

Presented  to  the  Faculty  of  the  School  of  Engineering 
of  the  Air  Force  Institute  of  Technology 

Air  University  s”  ' 

in  Partial  Fulfillment  of  the  (  ns 

% 

Requirements  for  the  Degree  of 

Master  of  Science  in  Space  Operations  - - - 

Accesion  For 

NTIS  CRA&I 
DTIC  TAB 
Unannounced 
Justification 


James  E.  Porter  III,  B.S° 
Captain,  USAF 

December  1985 


Approved  for  public  release;  distribution  unlimited 


m 


V-La] 


O 


L_a 


By... 

— - - 

I  Distinction/ 

A 

vailability  Codes 

Diet 

Avail  U’d/or 

M 

UpLCf^j 

I  * 


EBSEflBS 


This  thesis  develops  goodness-of  — F it  tests  for  the 
Pareto  distribution  by  generating  critical  value  tables  for 
the  modified  Kolmogorov— Smirnov,  Anderson-Darl ing,  and 
Cramer — von  Mises  statistics.  These  tables  can  be  used  to 
test  whether  a  set  of  observed  val ues  f ol 1 ows  a  Pareto 
distribution  when  the  location  and  scale  parameters  are 
unspecified  and  must  be  estimated  from  the  observed  sample 
data.  Additionally,  the  power  of  each  of  the  three 
goodness-of-f i t  tests  is  studied  and  compared.  Finally,  the 
functional  relationship  between  the  critical  values  and  the 
Pareto  shape  parameter  is  determined.  Hopefully  the  material 
is  presented  in  sufficient  detail  to  be  easily  understood  by 
those  with  only  a  passing  knowledge  of  statistical  analysis. 

I  wish  to  thank  my  reader  and  class  advisor. 

Lieutenant  Colonel  Joseph  Coleman,  who  guided  me  throughout 
my  AFIT  tour;  and  especially  my  thesis  advisor.  Dr.  Albert  H. 
Moore,  who  maintained  my  interest  in  statistical  analysis, 
offered  constant  encouragement,  and  suggested  the  subject  of 
this  thesis.  I  also  thank  my  classmates  Majors  Dennis  Charek 
and  Denny  Danielson  for  their  help  in  debugging  the  computer 
programs  used  in  this  thesis. 

Above  all  I  thank  my  family,  especially  my  wife  Judy, 
for  their  love  and  understanding  during  my  tour  at  AFIT. 


James  E.  Porter  III 


TftBUS  QE  CONTENTS 


Pr»fac» . 

List  of  Figures . . . 

List  of  Tables  ............... 

Abstract  ............  .  , 

I.  Introduction  .  , 

Chapter  Overview  .  . 

Background  . 

Problem  Statement  .  .. 

Research  Question  . 

Objectives  . . . 

Presentation  o-f  Research . 

II.  Goodness— o-f— Fit  Tests  . 

Chapter  Overvi ew  . 

Introduction  . 

Background  . 

Hypothesis  Testing  and  Test  Statistics 
Empirical  Distribution  Function  ... 

Using  Unknown  Parameters  . 

Kolmogorov-Smirnov  Statistic  ... 
Cramer — von  Mises  Statistic  .... 
Anderson-Darl ing  Statistic  .... 
Chapter  Summary  ........... 

III.  The  Pareto  Distribution  . 

Chapter  Overvi ew  . 

History  and  Application  . 

Origin . . 

Early  Applications  . 

Recent  Applications  . 

Air  Force  Applications  . 

The  Pareto  Function  ..  . 

Parameter  Estimation  .  . 

Various  Estimators  .  . 

Best  Linear  Unbiased  Estimator  .  . 

BLUEs  for  Shape  c>2  . 

BLUEs  for  Shape  c<2  . 

Summary  of  BLUEs  . 

Example  1 . . . 


i  i  i 


Pa  ge 


Modified  Test  Statistics  .  3-2 

Hypothesized  Pareto  CDF  .  3-2 

Example  2  ......  .  3-2 

Modified  K-S  Statistic  . . 3-2 

Example  3  3-3 

Modified  A-D  Statistic  .  3-3 

Example  4  3-3 

Modified  CV-M  Statistic  .  3-3 

Example  5  .......  .  3-3 

Chapter  Summary . . . 3-3 

Methodology  ......  .  4-1 

Chapter  Overview  . . 4-1 

Basic  Principles . .  4-1 

The  Monte  Carlo  Method  .  4-1 

The  Inverse  Transform  Technique  .  4-4 

Identifying  Critical  Values  .  4—9 

The  Plotting  Positions  Technique  .  4-1 

Specific  Procedures . . . 4-1 

Stage  1:  Generating  Critical  Value  Tables  .  .  4-1 

Stage  2:  Comparing  Power . 4-2 

Stage  3:  Determining  Relationship  .  4-2 

Chapter  Summary  .  .....  4-2 

Results  and  Application  . . .  5-1 

Chapter  Overview  .  5—1 

Critical  Value  Tables  .  5-1 

Power  Comparison  Tables  .  5—5 

Regression  Tables  .  5-8 

Use  of  Tables . 5-11 

Using  Critical  Value  Tables  .  5-11 

Using  Power  Comparison  Tables  .  5-13 

Using  Linear  Regression  Tables  .  5-13 

Chapter  Summary  .  5-15 

Analysis  and  Discussion . . . 6-1 

Chapter  Overview  .  6-1 

Critical  Values  .  6-1 

Power  Comparison . 6-2 

Regression  Analysis  ....  .  6-4 

Verification  and  Validation  .  6-5 

Chapter  Summary  .  6-7 

Conclusions  and  Recommendations  .  7-1 


Conclusions  .  . 
Recommends t i ons 


NMDO'Oh-NNM  ^  00  00  *0  n 


Appendix  A:  Computer  Program  for  Critical  Values 


Flow  Chart  .... 
Main  Program  CRITVAL 
Subroutine  PARDEV  . 
Subroutine  BXVALS  . 
Subroutine  BLCLE2  . 
Subroutine  BLCGT2  . 
Subroutine  HYPCDF  . 
Subroutine  TESTAT  . 
Subroutine  CRTVAL  . 


Appendix  B:  Computer  Program  for  Power  Comparison 

Flow  Chart . . . 

Main  Program  POWER  ....  . 

Subroutine  PARETO  .....  . 

Subroutine  BXVALS  .  . 

Subroutine  BLCLE2  .  ..... 

Subrout i ne  BLCGT2  . 

Subroutine  HYPCDF  ....  . 

Subroutine  TESTAT  . 

Subroutine  COMPAR  . 


Bibliography 


UL3I  QE  FIGURES 


Figure 

1  Three-Parameter  Pareto  Curves  -for  Shape  c*2  .  . 

2  Two-Parameter  Pareto  Curves  -for  Shape  c*2  .  .  . 

3  One— Parameter  Pareto  Curves  -for  Several  c  .  .  . 

4  Probability  Density  of  One-Parameter  Pareto  .  . 

5  Finding  Critical  Values  -from  Plotting  Positions 

6  Procedure  -For  6enerating  Critical  Values  .  .  . 

7  Procedure  -for  Determining  Power  Values  .  .  .  . 


U1ST  QE  tables 


Tab  la 

I  Calculation  of  BLUE* . 

II  Calculation  of  Hypothasizad  Pare  to  CDF 
III  Calculation  of  Modified  K-S  Statistic  .  . 
IV  Calculation  of  Modified  A-D  Statistic  .  . 

V  Calculation  of  Modified  C-VM  Statistic 
VI  Critical  Values  for  the  Modified  K-S  Test 

VII  Critical  Values  for  the  Modified  A-D  Test 

VIII  Critical  Values  for  the  Modified  C-VM  Test 

IX  Power  Test  for  HqS  Pareto  CDF  (c  **  1.0)  . 

X  Power  Test  for  Hq:  Pareto  CDF  (c  -  3.3)  . 


XI  K-S  Critical  Values  vs.  Pareto  Shape  Parameter. 
XII  C-VM  Critical  Values  vs.  Pareto  Shape  Parameter 


AFIT/GS0/MA/85D-6 


ABSTRACT 


Modified  Kolmogorov-Smirnov  (K-S) ,  Anderson-Darl ing 
(A— D) ,  and  Cramer — von  Mises  (C-VM)  critical  values  are  gener — 
ated  for  the  three— parameter  Pareto  distribution.  The  values 
may  be  used  to  test  whether  a  set  of  observations  follows  a 
Pareto  distribution  when  the  location  and  scale  parameters 
are  unspecified  and  thus  must  be  estimated  from  the  sample. 

A  Monte  Carlo  simulation  of  5000  repetitions  is  used  to 
generate  critical  values  for  sample  sizes  5(5)30  (i.e. ,  5  to 
30  in  increments  of  5)  and  Pareto  shape  parameters  .5 (.5) 4.0. 

A  5000— repet i t i on  Monte  Carlo  investigation  is  carried 
out  by  using  5,  15,  and  25  observations  from  eight  alternate 
distributions  to  compare  the  powers  of  the  K-S,  A-D,  C-VM, 
and  Chi-square  tests.  The  power  values  of  the  tests  are 
relatively  low  for  a  sample  size  of  five.  However,  the 
powers  of  the  modified  K-S,  A-D,  and  C-VM  tests  are  consider — 
ably  better  than  the  Chi-square  test  at  larger  sample  sizes. 
Next  to  the  Chi-square  test,  the  A-D  test  has  the  lowest 
power  in  most  cases. 

A  functional  relationship  is  identified  between  the 
modified  K-S  and  C-VM  test  statistics  and  the  Pareto  shape 
parameter.  The  critical  values  are  found  to  be  a  linear 
function  of  the  shape  parameters  between  1.5  and  4.0. 


MODIFIED  KOLMOGOROV -SMIRNOV, 
ANDERSON-DARLING,  AND  CRAMER-VON  MISES  TESTS 
FOR  THE  PARETO  DISTRIBUTION 
WITH  UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 


I.  INTRODUCTION 


Chapter  Overview 

This  chapter  introduces  the  topic  of  goodness-of-f i t 
testing  and  its  applications.  It  states  the  problem,  the 
research  question,  and  the  objectives  of  the  research. 

Background 

Because  the  Air  Force  depends  on  highly  complex  weapons 
systems  to  perform  its  missions,  factors  such  as  the  reliab¬ 
ility  and  mai ntainabi 1 i ty  of  equipment  continue  to  receive  a 
great  deal  of  emphasis.  Of  particular  importance  to  the  Air 
Force  is  the  ability  to  forecast  time-to-f ai 1 ure  of  equipment 
components  and  expected  maintenance  service  times. 

In  studying  such  phenomena,  analysts  often  face  the 
problem  of  testing  agreement  between  probability  theory  and 
actual  observations.  When  trying  to  develop  a  valid  statis¬ 
tical  model  of  observed  data,  the  analyst  performs  four  basic 
steps  (5:332): 

1.  Collect  and  plot  the  raw  data  to  develop  a 


histogram  (frequency  distribution  graph). 


2.  Hypothesize  the  underlying  statistical  distribu¬ 
tion  of  the  data  by  comparing  the  histogram  to  probability 
density  -functions  of  known  distributions. 

3.  Use  the  observed  data  to  estimate  parameters 
that  characterize  the  distribution. 

4.  Test  the  distributional  assumption  and  parameter 
estimates  for  goodness-of-f i t .  If  the  hypothesis  (that  the 
data  follow  the  assumed  distribution)  fails,  return  to  step  2 
(assume  a  different  distribution)  and  repeat  the  process. 

Goodness-of-f i t  tests  measure  the  degree  of  agreement 
between  the  distribution  of  an  observed  data  sample  and  a 
theoretical  distribution.  Three  tests  widely  used  for  this 

purpose  are  the  Kolmogorov-Smirnov  (K-S) ,  Anderson-Darl ing 
(A-D) ,  and  the  Cramer — von  Mises  (C-VM) .  Such  tests  have  been 
developed  for  several  well  known  distributions,  including  the 
normal,  exponential,  Weibull,  gamma,  uniform,  Laplace,  and 
others  (9; 19; 34; 35) .  However,  there  are  many  other  distribu¬ 
tions  which  have  not  been  successfully  examined  for  goodness— 
of-fit  when  the  parameters  of  the  distribution  are  unknown. 
One  such  distribution,  which  has  significant  potential  for 
Air  Force  applications,  is  known  as  the  Pareto  distribution. 

The  Pareto  distribution  is  an  important  function  in 
statistical  analysis,  and  several  applications  have  been 
identified  in  the  fields  of  economics  and  operations 
research.  For  example,  the  Pareto  distribution  has  played  a 
major  role  in  investigations  concerning  the  distributions  of 


city  population  sizes,  natural  resources,  stock  price 
fluctuations,  and  oil  field  locations  (28*242).  Other 
studies  indicate  that  the  Pareto  can  be  used  to  model 
phenomena  which  may  be  applicable  to  Air  Force  interests, 
such  as  time— to-fai lure  of  equipment  components  (16), 
maintenance  service  times  (22),  nuclear  fallout  dispersion 
(18),  and  error  clusters  in  communications  circuits  (7).  Use 
of  the  Pareto  for  such  practical  applications  would  be 
enhanced  by  an  accurate  method  to  test  goodness-of-f i t  of  the 
Pareto  distribution. 

Problem  Statement 

A  test  to  determine  goodness-of-f it  has  not  been  deve¬ 
loped  for  the  Pareto  distribution  when  the  location  and  scale 
parameters  are  unknown.  Such  a  test  would  be  useful  in 
determining  whether  a  random  sample  of  data  taken  from  an 
observed  phenomenon  behaves  as  the  Pareto  distribution. 

Research  Quest i on 

How  can  the  existing  K-S,  A-D,  and  C-VW  tests  be 
modified  to  produce  new  goodness-of-f it  tests  which  can  be 
applied  to  the  Pareto  distribution  when  the  location  and 
scale  parameters  are  unknown? 


The  objectives  of  this  thesis  are  tox 

1.  generate  and  document  the  modified  K-S,  A-D,  and 
C-VM  critical  value  tables  for  the  Pareto  distribution. 

These  tables  can  be  used  to  test  goodness-of-f i t  when 
parameters  of  the  distribution  are  unknown. 

2.  Compare  the  powers  of  the  modified  K— S,  A-D,  and 
C-VM  tests  to  determine  which  test  can  best  detect  a  false 
Pareto  distribution  hypothesis.  The  power  of  a  statistical 
test  is  the  probability  of  correctly  rejecting  a  false 
hypothesis. 

3.  Determine  what  (if  any)  functional  relationship 
exists  between  the  shape  parameter  and  the  critical  values 
generated  for  the  Pareto  function.  This  relationship  can 
then  be  used  to  interpolate  critical  values  corresponding  to 
parameters  not  found  in  the  generated  tables. 

Presentation  of  Research 

The  report  on  this  thesis  effort  is  presented  in  seven 
chapters.  In  this,  the  first  chapter,  the  general  topic  of 
goodness-of-f i t  has  been  introduced  and  the  problem,  research 
question,  and  objectives  have  been  stated. 

Chapter  II  describes  various  types  of  goodness-of-f it 
tests;  explains  hypothesis  testing  and  test  statistics;  and 
discusses  the  empirical  distribution  function. 

Chapter  III  describes  applications  of  the  Pareto 


distribution)  presents  its  various  forms;  explores  parameter 
estimation  for  the  Pareto  function;  and  develops  the  modified 
K-S,  A-D,  and  C— VM  test  statistics  for  the  Pareto. 

Chapter  IV  describes  the  basic  principles  and  specific 
procedures  used  to  satisfy  the  research  objectives. 

Chapter  V  presents  the  results  of  the  research  effort, 
including  tables  of  critical  values,  power  comparisons,  and 
regression  coefficients. 

Chapter  VI  further  discusses  the  results  of  the 
research.  Observations  are  made  concerning  the  tables  of 
critical  values,  power  comparisons,  and  regression 
coefficients. 

Chapter  VII  contains  conclusions  and  recommendations 
based  on  the  conduct  and  results  of  the  research  effort. 

Finally,  the  flow  charts  and  computer  programs  used  to 
carry  out  the  research  are  contained  in  the  appendices. 


II 


ISSIS 


Chapter  Overvi aw 

This  chapter  briefly  reviews  the  literature  to  provide 
a  background  for  goodness-of-f i t  tests.  It  also  describes 
hypothesis  testing  and  test  statistics  as  they  relate  to 
goodness-of-f i t.  Finally,  it  discusses  the  eepirical 
distribution  function  and  related  statistics,  including  the 
exact  and  computational  forms  of  the  Kol mogorov-Smi r nov 
(K-S) ,  Anderson— Dari ing  (A-D) ,  and  Cramer — von  Mises  (C-VM) 
test  statistics. 

Introduction 

Goodness-of-f it  tests  measure  the  degree  of  agreement 
between  the  distribution  of  an  observed  data  sample  and  a 
theoretical  statistical  distribution  (13:1B9>.  For  example, 
a  test  for  goodness-of-f i t  may  involve  examining  a  random 
sample  from  some  unknown  distribution  to  test  the  hypothesis 
that  the  underlying  distribution  is  actually  a  known, 
specified  function  (13* 345).  If  such  tests  indicate  a  close 
fit,  the  hypothesized  distribution  can  then  be  applied  in 
simulation  modeling  to  predict  failure  and  operational 
availability  rates  of  Air  Force  systems  and  their  components. 


For  years  statisticians  hava  at tamp tad  to  find  tast 
statistics  whosa  sampling  distributions  do  not  depend  on 
certain  parameter  values  or  on  the  explicit  form  of  the 
distribution  of  the  population.  Such  tests  are  called 
non-par ametric  or  distribution— free  tests  (39:68). 

Two  of  the  oldest  and  best  known  di stri but ion-free 
tests  for  goodness -of  —fit  are  the  Chi-square  and  the 
Kol mogor ov-Smi r nov  (K-S)  tests  (13i 189} 47*2) .  The  Chi-square 
test  compares  frequencies  of  the  observed  data  with  expected 
frequencies  of  the  hypothesized  distribution.  The  test  is 
flexible  enough  to  allow  some  parameters  to  be  estimated  from 
the  observed  data,  but  it  has  some  limitations.  For  example, 
it  is  restricted  to  large  sample  sizes  (1x73).  Also,  it 
requires  that  the  data  be  arbitrarily  grouped,  which  may 
affect  the  results  (13:357).  The  K-S  test  compares  the 
cumulative  distribution  function  (CDF)  of  the  hypothesized 
distribution  against  the  empirical  distribution  function 
(EDF)  of  the  observed  data  sample.  The  K-S  test  can  be  used 
for  large  or  small  samples;  however,  it  is  restricted  to 
distributions  which  are  fully  specified  (i.e.,  there  can  be 
no  unknown  parameters  that  must  be  estimated  from  the  sample) 
(13:357).  The  same  limitation  applies  to  two  other  related 
methods,  the  Anderson-Darling  (A-D)  and  the  Cramer — von  Mises 
(C-VM)  tests  (19:204;  47:3-4). 


In  a  significant  development,  David  and  Johnson  (14) 
-found  that  if  a  distribution  has  only  a  location  and  scale 
parameter,  then  the  K-S  and  related  goodness-of-f i t  tests  are 
independent  of  the  true  parameter  values  when  the  parameters 
are  replaced  by  invariant  estimators.  The  estimators  must  be 
invariant  in  the  sense  that  if  each  x  is  transformed  by 
x+ax+b  then  the  estimate  T»T(x)  is  similarly  transformed  by 
T+aT+b  (4:4).  Therefore,  critical  values  dependent  only  on 
sample  size  and  significance  level  can  be  generated  (54:5). 
This  property  also  applies  to  a  three-parameter  CDF  provided 
the  shape  parameter  is  treated  as  a  constant.  A  more 
detailed  explanation  of  this  principle  is  included  below  in 
the  section  on  "Using  Unknown  Parameters". 

Based  on  this  discovery  by  David  and  Johnson,  critical 
value  tables  for  the  K-S  and  related  tests  have  been  modified 
to  allow  their  use  in  several  cases  where  parameters  are 
estimated  from  observed  data.  In  a  modified  test,  the  form 
of  the  test  statistic  itself  remains  essentially  the  same, 
except  that  estimates  are  used  in  place  of  exact  parameters. 
However,  the  critical  values  for  a  modified  test  are 
considerably  different.  The  critical  value  tables  are  no 
longer  the  same  for  all  distributions.  Instead,  they  are 
different  for  each  different  hypothesized  distribution 
function.  A  modified  test  is  still  non— parametric  or 
distribution-free  because  the  level  of  significance  is  still 
independent  of  any  untested  assumptions  regarding  the 


i^_Ji 


>  it 

r- 


2-3 


distribution  o-f  tha  underlying  population.  In  fact,  the  form 
of  tha  hypothasizad  distribution  is  tha  hypothasis  baing 
tastad  (13t357). 

There  are  numerous  cases  for  which  modified  tests  have 
already  bean  developed.  For  example,  Li 1 lief or s  developed  a 
modified  K-S  test  for  the  normal  (34)  and  exponential  (35) 
distributions;  Ream  (43)  developed  another  set  of  modified 
tests  for  the  normal  distribution;  Woodruff,  Moore,  and 
Cortes  (S3)  developed  a  modified  K-S  test  for  the 
three-parameter  Weibull  distribution;  Bush  (9)  modified  the 
A-D  and  C-VM  tests  to  expand  the  goodness-of-f it  tests  for 
the  Weibull  distribution;  Viviano  (49)  modified  the  K-S,  A— 0, 
and  C-VM  tests  for  the  gamma  distribution;  and  Yoder  (54) 
developed  a  modified  K-S,  A— D,  and  C-VM  test  for  the  logistic 
distribution.  The  modified  K-S,  A-D,  and  C-VM  tests  have 
also  been  developed  for  the  uniform,  normal ,  Laplace, 
exponential,  and  Cauchy  distributions  (19).  Using  a 
different  technique,  Woodbury  (52)  too  developed  a  set  of 
modified  tests  for  the  uniform  distribution. 

Hypothesis  T est i nq  and  Test  Statistics 

A  fundamental  concept  in  statistical  testing  is  the 
hypothesis  test.  When  studying  a  given  phenomenon,  it  is 
often  desirable  to  determine  the  distribution  of  the  popula¬ 
tion  being  studied.  In  many  cases,  however,  it  is  not 
practical  to  observe  the  entire  population.  Instead,  a 


2-4 


relatively  small  sample  of  the  population  is  usually 
selected,  and  observations  are  made  -from  the  small  sample. 

Hypothesis  testing  is  the  process  of  inferring  from  a 
sample  whether  to  nacceptM  a  certain  statement  (the  null 
hypothesis)  about  the  papulation  from  which  the  sample  is 
drawn.  Actually,  "acceptance"  of  the  null  hypothesis  does 
not  imply  that  the  null  hypothesis  is  true,  but  that  there  is 
insufficient  evidence  from  the  data  sample  to  reject  the 
hypothesis.  The  null  hypothesis,  denoted  Hq,  is  the  hypothe¬ 
sis  to  be  tested.  The  alternative  hypothesis,  denoted  Hj»  is 
equivalent  to  stating  that  Hq  is  not  true  (13:75-76). 

Another  key  concept  in  statistical  testing  is  the  test 
statistic,  a  function  of  random  variables  which  is  used  to 
help  make  the  decision  in  a  hypothesis  test.  In  order  to  be 
useful  for  data  analysis,  the  test  statistic  chosen  should 
possess  certain  desirable  properties.  Most  importantly,  the 
statistic  should  assign  real  numbers  to  points  in  the  sample 
so  that  the  points  are  arranged  in  an  order  which  reflects 
their  ability  to  distinguish  between  a  true  Hq  and  a  false  Hq 
(13:77).  For  example,  the  test  statistic  normally  assigns 
larger  values  to  situations  that  indicate  most  strongly  that 
Hq  ought  to  be  rejected,  while  smaller  values  of  the  test 
statistic  usually  indicate  insufficient  evidence  to  reject 
Hq.  In  this  type  of  "one-tailed"  test,  if  the  value  of  the 
test  statistic  for  a  given  set  of  data  is  greater  than  a 
certain  "critical  value”,  the  analyst  would  reject  H0 


(13*77).  The  critical  value  is  chosen  so  that  when  the  null 
hypothesis  Hq  is  true,  the  chance  of  erroneously  rejecting  Hq 
is  some  specified  probability  (e.g.,  .01  or  .03)  (2:193). 

There  are  two  types  of  errors  that  can  be  Made  in 
applying  the  decision  criterion.  The  Type  I  error  results  in 
rejection  of  Hq  when  Hq  is  true.  The  Type  II  error  results 
in  acceptance  of  Hq  when  Hq  is  false.  The  probability  of 
cooaitting  a  Type  I  error,  denoted  by  a,  is  called  the  level 
of  significance  of  the  test.  The  probability  of  a  Type  II 
error  is  denoted  yS  .  The  power  of  a  statistical  test, 
denoted  1  -  /?  ,  is  the  probability  of  correctly  rejecting  a 
false  Hq  (13*79). 

Statistics  Based  on  the  Empirical  Distribution  Function 

One  class  of  test  statistic  used  in  goodness-of-f i t 
testing  compares  an  observed  sample  distribution  function  and 
an  hypothesized  theoretical  distribution  function.  These 
statistics  are  based  on  the  empirical  distribution  function 
(EDF) ,  and  in  many  cases  are  easily  calculated  and 
competitive  in  terms  of  power.  The  K-S,  A-D,  and  C-VM  test 
statistics  are  of  the  EDF  type  (45:730). 

When  analyzing  phenomenon  such  as  time-to— failure  of 
equipment  components,  H(x),  the  actual  distribution  function 
of  the  phenomenon,  is  rarely  known.  Often  an  educated  guess 
of  the  form  of  the  distribution  is  made,  and  the  guess  is 


used  to  approximate  the  true  distribution  function.  One  way 


to  Mka  a  "good  guess"  is  to  observe  several  values  -from 
random  samples  of  the  phenomenon  and  construct  a  graph  that 
can  be  used  to  estimate  the  entire  unknown  distribution 
-Function  H(x).  One  widely  used  method  o-F  constructing  such  a 
graph  is  the  empirical  distribution  Function  S(x),  which 
equals  the  fraction  of  observed  values  that  are  less  than  or 
equal  to  x  (47a 1),  i.e. , 


number  of  values  <  x 

Six)  »  - - -  (1) 

total  number  of  values 

For  a  sample  consisting  of  n  observations,  the  EDF,  which  may 
be  denoted  Sn(x)  to  indicate  the  particular  sample  size,  is  a 
step-shaped  function  where  each  step  is  of  height  1/n  and 
occurs  only  at  the  sample  values.  As  n  becomes  larger,  Sp(x) 
should  better  approximate  H(x),  provided  that  HQ  is  true. 

When  the  n  observations  are  arranged  in  ascending  order, 
i.e.,  letting  x(1),  x (2) *  *  *  *  *  * (n)  b*  “order  statistics” 

(15:4;  20:70),  then  Sn(x>  is  defined  (47:1)  as: 


Sn(x) 


0  for  all  x  <  x ( j j 

■  i/n  for  x^)  <  x  <  i*I, 2, ... ,n-l 

1  for  all  x  >  x 


(2> 


Like  a  CDF,  Sn(x)  is  a  nondecreasing  function  that  ranges 
from  zero  to  one  in  height;  however,  Sn(x)  is  determined 
empirically  (from  an  observed  sample),  thus  its  name  (13:70) 


In  a  typical  tact  -for  goodness-of-f i t ,  a  random  sample 
from  an  unknown  distribution  is  examined  to  test  the  null 
hypothesis  that  the  unknown  CDF  H(x)  is  in  fact  a  known, 
specified  function  F(x),  i.a.,  HqI  H(x)  *  F(x>.  The  random 
sample  is  compared  with  the  hypothesized  distribution  F(x)  in 
some  way  to  determine  whether  it  is  reasonable  to  conclude 
that  F(x)  is  the  true  CDF  of  the  random  sample.  Using  the 
EDF  Sn<x)  is  one  way  to  compare  the  random  sample  with  F(x). 
The  fact  that  Sn(x)  is,  by  definition,  the  proportion  of  a 
random  sample  less  than  x  implies  that  it  should  serve  as  a 
good  estimate  of  F(x),  which  is  defined  as  the  probability 
that  the  random  variable  X  is  less  than  the  value  x  (47:1). 
Since  the  EDF  SR(x)  may  be  useful  as  an  estimator  of  the 
hypothesized  CDF  F<x),  then  Sn<x>  can  be  compared  with  F(x> 
to  see  if  there  is  close  agreement.  If  the  level  of 
agreement  is  poor,  then  the  null  hypothesis  is  rejected, 
i.e.,  the  true  but  unknown  CDF  H(x)  is  not  the  same  as  the 
hypothesized  function  F(x)  (13:345). 

Based  on  this  approach,  the  K-S,  A-D,  and  C-VM  tests 
use  criteria  that  measure  the  discrepancy  or  "distance" 
between  the  hypothesized  CDF  F(x),  which  approximates  H(x) 
under  Hq,  and  the  EDF  Sn(x).  The  definitions  of  the  three 
criteria  relate  to  the  full  range  of  x,  leading  to  integral 
forms  of  the  A-D  and  C-VM  test  statistics.  Conveniently,  all 
three  test  statistics  can  be  expressed  in  computational  forms 
in  terms  of  F  and  S_  at  the  observed  x  values  (19:204). 


Using  Unknown  Paranutars.  In  their  unmodi -f i ed  -forms, 
most  popular  goodness-of -f i t  tests  based  on  EDF  statistics, 
including  the  K-S,  C-VM,  and  A-D  tests,  are  meant  to  be  used 
only  when  the  null -hypothesized  distribution  F(x)  is  fully 
specified  (i.e.,  when  all  parameters  are  known).  However, 
cases  are  rare  in  statistical  practice  when  Hq  is  completely 
specified;  chus,  it  is  more  realistic  to  have  unknown 
parameters  for  the  null  distribution.  When  unknown 
parameters  are  involved,  the  K-S,  C-VM,  and  A-D  tests  are  no 
longer  distribution-f ree,  so  that  different  critical  values 
will  relate  to  different  F(x)  in  the  null  hypothesis 
(19:204).  The  reason  for  this  is  that  the  distributions  of 
these  and  other  EDF  statistics  depend  on  the  sample  size  n 
and  also  on  the  values  of  the  unknown  parameters  (47:4). 

The  K-S,  C-VM,  and  A-D  tests  depend  on  the  probability 
integral  transformation  described  by  David  and  Johnson  (14). 
This  transf ormation,  when  applied  to  a  random  sample  from  a 
distribution  of  specified  parameters,  produces  ordered  values 
from  a  uniform  distribution  over  the  interval  from  0  to  1. 
These  values  are  then  used  to  calculate  the  EDF  test 
statistic.  As  a  result,  the  EDF  statistic  becomes  a  function 
of  ordered  uniform  random  variables.  However,  when 
parameters  are  unknown  and  must  be  estimated  from  the  sample, 
the  transf ormation  fails  to  produce  ordered  uniform  random 
variables  (47:4).  Unless  appropr i atel y  modified,  therefore, 
any  EDF  tests  based  on  this  transf or mat i on  will  generally  be 


restricted  to  cases  where  all  parameters  are  specified. 

An  important  exception  occurs  if  the  unknown  parameters 
are  location  and  scale  only.  David  and  Johnson  (14)  showed 
that  if  a  distribution  can  be  completely  specified  by  a 
single  parameter  for  location  and  a  single  parameter  for 
scale,  then  goodness-of-f i t  tests  based  on  the  probability 
integral  transf or mat ion  are  independent  of  the  true  parameter 
values  when  invariant  estimators  are  used  (38:384). 

Fortunately,  the  Pareto  distribution  can  be  completely 
specified  by  a  single  location  and  a  single  scale  parameter 
(28:239).  The  three-parameter  form  of  the  Pareto,  presented 
in  the  next  chapter,  can  be  expressed  in  terms  of  a  single 
location  and  scale  parameter  by  treating  the  shape  parameter 
as  a  known  constant.  Thus,  the  value  of  each  EDF  test 
statistic  for  the  Pareto  will  depend  only  on  the  sample  size 
and  significance  level,  but  not  on  the  exact  values  of  the 
unknown  parameters  (35:387).  As  a  result,  rather  than  having 
to  produce  a  separate  set  of  critical  value  tables  for  each 
set  of  location  and  scale  parameters,  only  one  set  of  tables 
is  needed  for  each  shape  parameter  and  each  sample  size  n. 

It  is  this  principle,  coupled  with  the  fact  that  the  Pareto 
possesses  the  necessary  location  and  scale  property,  that 
allows  the  generation  of  valid  critical  value  tables  for  the 
Pareto  distribution  (47:5). 

To  accomplish  this  goal,  the  existing  (unmodified)  K-S, 
A— D,  and  C-Vtl  test  statistics  can  be  modified  using  an 


10 


2- 


invariant  estimator;  but  -first,  the  unmodified  statistics  are 
discussed  in  the  following  sections. 


The  Kol mogorov-Smi rnov  Statistic.  The  K-S  statistic  in 
its  unmodified  form  is  especially  useful  when  sample  sizes 
are  small  and  when  no  parameters  are  estimated  from  the  data. 
Often  it  is  a  more  powerful  test  than  the  Chi-square  for  any 
sample  size  (34:399;  39:76).  However,  when  parameter 
estimates  must  be  made  from  the  sample,  the  Chi-square  test 
is  easily  modified  by  reducing  the  number  of  degrees  of  free¬ 
dom,  whereas  the  existing  K-S  critical  values  are  overly  con¬ 
servative  and  must  be  modified  using  Monte  Carlo  techniques 
(5:357).  In  this  context,  the  term  "conservative"  means  that 
the  critical  values  are  too  large  so  that  the  actual  level  of 
significance  is  smaller  than  the  stated  level  of  significance 
(13:90) . 

The  K-S  test  statistic  (36:259-260;  5:270;  19:204)  is 
the  largest  (denoted  "sup"  for  supremum)  vertical  distance 
between  the  completely  specified  hypothesized  CDF  F(x)  and 
the  observed  EDF  Sn(x).  Therefore,  the  test  statistic  is 
expressed  as: 


D  =  sup  Jf(x)  -  S_(x)| 
x  1 


(3) 


which  is  equivalent  to  the  computational  form  given  by 


D  *  max  (D  ,  D“)  (4) 

Hq  is  rejected  if  D  exceeds  a  corresponding  critical  value 
(13:358) . 

If  there  are  n  observations,  x^)  is  the  i-th  smallest 
observation,  and  Zj  =  F(xt^j>  then  (39:69): 

D+  *  sup  C(i/n)-zi3  and  D“  =  sup  Czi-(i-l)/n3  (5) 

l<.i<n  l<_i  <_n 

Thus  the  K-S  statistic  is  the  larger  of  these  two  values. 

The  Cramer — von  Mises  Statistic.  Another  way  to  measure 
the  discrepancy  between  the  hypothesized  CDF  F(x)  and  the 
observed  EDF  Sn(x)  is  to  use  statistics  of  the  Cramer — von 
Mises  family,  based  on  the  squared  integral  of  the  difference 
between  the  EDF  and  the  distribution  tested  (47:2).  One  such 
statistic  is  the  C-VM  statistic  itself  (46:357): 

W2  =  n  I**  CSn<x)-F(x>  32dF(x)  (6) 

J-m 

which  in  computational  form  is  (3:766;  45:731): 

W2  *  C 1 / ( 1 2n ) 3  +  X  Cz  .  -  (2j-l)/2n32  (7) 

j  =  l  J 

where  x  <  j  j  <x  ^)  (n)  are  n  ordered  observations  from  the 


sample  and  z  . 


F  (x  f  .  %  )  for  j  =  l ,  2 


n 


The  Anderson-Darl ing  Statistic.  Another  member  of 
the  Cramer-von  Mises  -family  is  the  A— D  statistic.  To  allow 
more  flexibility  in  goodness-of — f i t  tests,  Anderson  and 
Darling  (2:194)  introduced  the  technique  of  incorporating  a 
weight  function  into  the  K-S  and  C-VM  test  statistics.  The 
result  is  still  another  method  of  testing  the  hypothesis  that 
n  observations  have  been  drawn  from  a  population  with 
specified  distribution  function  F(x>. 

Anderson  and  Darling  (3:767)  suggested  using  a 
nonnegative  weight  function,  here  denoted  9(u),  chosen  by  the 
analyst  to  accentuate  the  values  of  SR(x)  -  F(x)  in  those 
areas  where  the  test  is  desired  to  have  greater  sensitivity. 
This  weight  function  serves  to  counteract  the  fact  that  the 
discrepancy  between  Sn<x)  and  F(x>  becomes  smaller  in  the 
tails,  since  each  approaches  0  and  1  at  the  extremes  (47:2). 
They  found  that  choosing  the  weight  function  9  in  the  form  of 
®(u)  *  1/Cu(l— u) 1  has  the  effect  of  heavily  weighting  the 
discrepancy  in  the  tails  of  the  two  distributions.  The 
resulting  A— D  test  statistic  (2:193;  46:357)  is: 

A2  -  n  f  **CSn  (x)-F(x)  329CF(x)  3dF(x)  (8) 

-'—u 

where  9CF(x)3  =  CF(x>  ■  (l-F(x))]-1 

Thus  the  C-VM  statistic  may  be  considered  a  special  case  of 
the  A-D  statistic  where  9CF(x)3  *  1. 


In  computational  -form  thm  A— D  statistic  is  (3* 765) 


A2  «  -n  -  ( 1/n)  5  (2 j-1 )  Cln  z.  +  ln<l-z_+1_.  >3  <9> 

J  j  n+l-j 

where  x  (d<x  (2)—’  *  *— x  (n)  arc  n  ordered  observations  -from  the 
sample  and  z^  *  F(x^^>  for  j«l,2,...,n. 

The  A— D  statistic  is  designed  to  be  used  when  the 
analyst  wants  the  test  to  have  good  power  against  alterna¬ 
tives  in  which  F(x)  and  H(x),  the  true  distribution,  disagree 
near  the  tails  of  F(x),  and  is  willing  to  sacrifice  power 
against  alternatives  in  which  they  disagree  near  the  median 
of  F(x)  (3:767).  Thus,  the  A— D  statistic  is  used  when  the 
analyst  wants  to  reject  Hq  if  H(x)  differs  greatly  from  F(x), 
and  especially  if  the  difference  is  in  the  tails. 

Chapter  Summary 

The  K-S,  A-D,  and  C-VM  tests  are  non-par ametric  tests  of 
goodness-of — f i t  which  offer  advantages  over  the  older 
Chi-square  test.  In  their  usual  forms,  the  K-S,  A-D,  and 
C-VM  tests  are  restricted  to  distributions  which  are  fully 
specified.  However,  when  location  and  scale  parameters  are 
replaced  by  invariant  estimators,  the  three  tests  can  be 
modified  to  produce  valid  critical  values  for  a  given 
distribution.  Hypothesis  testing  and  test  statistics  are  two 
statistical  concepts  which  can  be  used  to  modify  the  existing 
tests  for  the  Pareto  distribution,  which  is  discussed  in 
detail  in  the  next  chapter. 


III.  Xtg  PARETO 


Chapter  Overvi gw 

This  chapter  reviews  the  history  and  application  of  the 
Pareto  Law;  presents  the  Pareto  distribution  and  its  three 
parameters;  explores  parameter  estimation  -for  the  Pareto 
-function;  and  develops  the  modified  Kol mogorov-Smi rnov  (K-S) 
Anderson-Darl ing  (A-D) ,  and  Cramer — von  Mises  (C-VM)  test 
statistics  for  the  Pareto  distribution. 


History  and  Application 

Origin.  The  Pareto  distribution  is  an  important  func- 
tion  in  statistical  analysis.  It  is  named  after  Vi If redo 
Pareto  (1848-1923),  a  Swiss  professor  of  economics  who  con¬ 
ducted  the  first  extensive  statistical  study  of  the  distribu 
tion  of  incomes.  His  analysis  of  nineteenth  century  income 
in  various  countries  led  to  the  development  of  his  first  law 


.  .  .  if  x  signify  Csicl  a  given  income  and  N  the 
number  of  persons  with  incomes  exceeding  x,  and  if 
a  curve  be  drawn,  of  which  the  ordinates  are 
logarithms  of  x  and  the  abscissae  logarithms  of  N, 
this  curve,  for  all  the  countries  examined,  is 
approximately  a  straight  line  .  .  .  This  means 
that,  if  the  number  of  incomes  greater  than  x  is 
equal  to  N,  the  number  greater  than  mx  is  equal  to 
N/m1’^,  whatever  the  value  of  m  may  be.  Thus  the 
scheme  of  income  distribution  is  everywhere  the 
same  C 42: 6471. 


Therefore,  "the  logarithm  of  the  percentage  of  units  with  an 
incomm  greater  than  some  value  is  a  linear  function  of  that 
value  with  negative  slope,  provided  that  this  value  is 
greater  than  an  appropriate  positive  number"  (32:6).  This  is 
known  as  the  "strong"  form  of  the  Pareto  Law,  with  functional 
form  given  by  equation  (11)  below.  The  "weak"  form  of  the 
law  pertains  to  the  asymptotic  nature  of  a  distribution’s 
tail  and  implies  that  if  log  Cl-Fx(x)3  is  plotted  against  log 
x,  then  the  resulting  curve  should  be  asymptotic  to  a  line 
with  slope  -c  as  x  gets  larger  (32:6;  28:245). 

Early  Applications.  Since  the  early  days  of  its  formu¬ 
lation,  the  Pareto  Law  and  its  related  distribution  functions 
have  been  examined  primarily  for  potential  applications  in 
economics  and  operations  research. 

Based  on  his  statistical  observations,  Pareto  believed 
that  any  influence  that  causes  an  increase  in  the  national 
income  overall  must  also  increase  the  income  of  the  poor: 

"We  cannot  be  confronted  with  any  proposal  the  adoption  of 
which  would  both  make  the  dividend  larger  and  the  share  of 
the  poor  smaller,  or  vice  versa"  (42:648).  Pareto  also 
believed  his  law  to  be  universally  inevitable,  regardless  of 
economic,  social,  and  political  conditions.  Economists  have 
since  identified  flaws  (11:609;  17:171)  in  the  Pareto  Law  to 
the  extent  that  for  several  years  the  Pareto  distribution 
became  disreputable  (28:233;  7:235)  as  an  economic  predictor: 


K 

r,  .’■% 


3-2 


The  general  defence  of  "Pareto’s  Law"  as  a  law  of 
even  limited  necessity  rapidly  crumbles.  His 
statistics  warrant  no  inference  as  to  the  effect  on 
distribution  of  the  introduction  of  any  cause  that 
is  not  already  present  ...  This  consideration  is 
really  fatal;  and  Pareto  is  driven,  in  effect,  to 
abandon  the  whole  claim  142:6543. 


Nevertheless,  more  recent  studies  have  shown  the  Pareto 
distribution  can  be  very  useful. 

Recent  Applications.  Several  more  recent  studies  have 
revived  interest  in  the  Pareto  distribution  by  demonstrating 
that  it  can  be  used  to  model  or  predict  numerous  empirical 
phenomena.  For  example,  the  Pareto  distribution  has  played  a 
major  role  in  investigations  concerning  city  population  size, 
resources,  stock  price  fluctuations,  and  oil  fields  (28:242). 
The  Pareto  has  also  been  used  to  describe  property  values, 
inheritance,  business  mortality,  worker  migration,  consumer 
prices,  and  effects  of  underreported  income  (32:7;  51). 

Fisk  (17:171,  174-175)  showed  that  in  some  cases  the 
Pareto  distribution  offers  an  improvement  over  the  lognormal 
distribution,  especially  at  the  extremities  (tails)  of  the 
distribution.  Steindl  (44:187-246)  cited  several  examples  of 
empirical  economic  data  which  follow  the  Pareto  distribution, 
including  the  distribution  of  wealth,  jobs  by  basic  salary, 
the  growth  rate  of  firms  and  corporations,  and  several 
others.  He  also  reaffirmed  the  Pareto  Law’s  usefulness  in 


economic  theory: 


Empirical  laws  art  rare  in  economics,  and  the  most 
obvious  instance  of  such  laws  is  the  regular  pat¬ 
tern  a- f  certain  statistical  distributions,  such  as 
the  distribution  of  persons  according  to  income  or 
of  business  -firms  according  to  sales.  A  good  many 
of  these  distributions  conform  to  the  so-called  law 
o-f  Pareto,  i.e.  the  number  of  firms  (for  example) 
with  sales  in  excess  of  X,  plotted  against  X  on 
logarithmic  paper,  is  a  straight  line  ...  The 
Pareto  distribution  is  encountered  in  many  fields 
and  often  the  fit  is  very  good  C44:112. 


Air  Force  Applications.  Other  studies  have  shown  that 


the  Pareto  can  be  used  to  model  phenomena  which  may  be  appli¬ 
cable  to  Air  Force  interests,  such  as  time-to-f ai lure  of 
equipment,  maintenance  service  times,  nuclear  fallout  part¬ 
icles,  and  error  clusters  in  communication  circuits. 

For  example,  Davis  and  Feldstein  (16:299)  showed  the 
Pareto  can  be  used  to  model  survival  data  based  on  a 
population  of  items  whose  times-to-f ai lure  from  a  well 
defined  origin  are  being  observed.  If  each  member  of  the 
population  has  a  constant  hazard  rate  based  on  a 
two-parameter  gamma  distribution,  then  the  time— t o-f ai lure 
for  the  population  is  the  Pareto  type  II  of  equation  (13). 
Further,  in  some  cases  the  Pareto  competes  with  the  Weibull 
distribution  as  a  model  for  failure  times  of  components. 

Like  the  Weibull,  the  generalized  Pareto  includes  the 
exponential,  and  can  therefore  be  used  to  test  departures 
from  the  exponential  (16:305-306). 

Kaminsky  and  Nelson  (30)  showed  how  the  Pareto  distribu¬ 


tion  can  be  used  in  situations  involving  life  testing. 


sss;* 


reliability,  and  replacement  policy.  Specifically,  they 
showed  how  to  use  the  Pareto  to  predict  the  time  of  future 
failures  from  times  of  early  failures  in  the  same  sample. 

They  found,  for  example,  that  if  items  are  put  into  service 
simultaneously,  and  it  becomes  necessary  to  begin  replacing 
them  when  a  certain  percentage  remain  functional,  then  it  is 
possible  to  predict  the  replacement  time  of  future  failures 
from  the  early  failure  times.  In  another  example,  "if  n 
items  form  an  n-component  parallel  system,  then  we  can 
predict  the  time  of  system  failure  ..."  (30:145). 

The  Pareto  distribution  can  also  be  of  use  in  modeling 
queuing  systems  in  which  equipment  maintenance  service  times 
are  conditioned  upon  a  random  parameter.  Harris  (22:307) 
showed  that  if  the  conditional  service  distribution  is 
exponential  and  the  random  parameter  has  a  gamma  density, 
then  the  resultant  service  times  follow  the  Pareto 
distribution.  Further,  if  a  system  consists  of  components 
which  have  exponentially  distributed  times-to-f ai lure  with  a 
gamma  parameter  density,  then  the  unconditional  times  to 
failure  would  follow  the  Pareto  distribution  (22:312). 

Harris  also  used  the  Pareto  to  develop  a  model  which  provides 
a  means  of  obtaining  measures  of  effectiveness  of  a  large 
scale  and  complicated  queuing  process  (22:308-309). 

Freiling  showed  that  the  Pareto  distribution,  in  the 
form  of  equation  (10)  with  c  =*  3,  can  be  used  to  model  mass 
sizes  of  nuclear  fallout  particles  (18:4).  In  addition,  he 


compared  the  usefulness  of  the  Pareto  and  lognormal 
distributions  in  modeling  the  size  distribution  of  particle 
mass  in  the  fallout  from  land-surface  bursts.  For  this 
specific  application,  Freiling  found  close  similarities 
between  the  two  distributions:  "The  agreement  is  such  that 
if  one  curve  is  correct,  the  other  will  never  be  proved  wrong 
.  .  .  Thus  it  appears  that  the  differences  between  the  two 
approaches  are  trivial"  (18:12).  He  concluded  his  study  by 
noting  that,  in  the  case  of  nuclear  airburst  debris,  the 
lognormal  distribution  has  the  advantage  of  having  an 
"observational ly  confirmed  theoretical  basis."  If  the 
observational  data  is  truncated,  however,  the  Pareto 
distribution  has  the  advantage  of  simplifying  calculations  of 
particle  surface  distribution. 

In  a  study  of  error  clusters  in  communication  circuits, 
Berger  and  Mandelbrot  (7:224)  revealed  still  another  applica¬ 
tion  of  the  Pareto  distribution.  They  proposed  a  new  mathe¬ 
matical  model  to  describe  the  distribution  of  the  occurence 
of  errors  in  data  transmission  over  telephone  lines.  They 
found  that  the  statistics  of  communications  errors  can  be 
described  in  terms  of  an  error  probability  depending  solely 
on  the  time  elapsed  since  the  last  occurrence  of  an  error. 
Further,  they  discovered  that  the  distribution  of  inter— error 
intervals  closely  approximates  the  Pareto  distribution  of 
exponent  less  than  one.  As  a  result,  the  relative  number  of 
errors  tend  to  zero  as  message  lengths  increase. 


.  ■’.V.  H  -  ■.  *.  ' 


The  Pare to  Function 

Pareto’s  Law  in  its  original  form  can  bs  expressed  as 
N  *  Ax-C  where  A  and  c  are  parameters  which  characterize  the 
function  and  N  is  the  number  of  people  having  income  of  at 
least  x.  In  a  form  more  commonly  used  in  statistical 
analysis,  Pareto’s  Law  becomes  the  Pareto  distributions 

P(x)  -  PrCX>x 1  *  <k/x)c  for  k,c  >  0;  x  >  k  (lO) 

where  P(x)  is  the  probability  that  the  value  of  a  random  var — 
iable  X  (e.g.,  income)  is  at  least  x,  k  is  a  lower  bound  on  X 
(e.g.,  some  minimum  income),  and  c  characterizes  the  shape  of 
the  graph  of  the  distribution  (20*233-234). 

Accumulated  probabilities  over  the  range  of  values  of  x 
are  given  by  the  corresponding  cumulative  distribution  func¬ 
tion  (CDF)  of  X,  also  known  as  the  "Pareto  distribution  of 
the  first  kind"  (28*234)  or  the  "strong"  Pareto  law  (32*50): 

Fj(<x)  *  1  -  (k/x)c  for  k,c  >0;  x  2.  k  (11) 


The  corresponding  Pareto  probability  density  function  is: 


Pjj(x)  *  ckc/xc+*  *  (c/k)  (k/x )  c'ri  for  c>0;x>k>0  (12) 


c+1 


3-: 


Pareto  proposed  two  other  forms  of  the  distribution. 

The  "Pareto  distribution  of  the  second  kind"  (also  called  the 
Pareto  Type  II  or  the  Lomax  distribution),  is: 

Fx(x)  -  1  -  Kj/Kx+C)*1!  (13) 

The  third  farm  proposed  by  Pareto,  the  "Pareto  distribution 
of  the  third  kind"  (or  Pareto  Type  III),  has  the  CDF: 

Fx(x)  »  1  -  k2e“bx/C (x+C)cD  (14) 

which  reduces  to  the  Type  II  form  when  b  *  0. 

The  basic  difference  between  these  various  forms  is  in 
the  number  of  parameters.  The  Pareto  distribution  of  the 
first  kind,  equation  (11),  represents  the  "usual  formulation" 
of  the  function  and  is  the  one  most  commonly  found  in  the 
literature.  However,  the  fact  that  it  consists  of  only  two 
parameters  (i.e. ,  c  and  k)  may  limit  its  usefulness  in 
general  applications.  Hastings  and  Peacock  (26)  regard  three 
types  of  parameters  as  basic  to  any  distribution  function. 
These  three  parameters  are  the  location,  scale,  and  shape 
parameters,  which  they  denote  as  a,  b,  and  c  respectively. 

The  location  parameter  (a)  represents  "the  abscissa  of  a 
location  point  (usually  the  lower  or  midpoint)  of  the  range 
of  the  variate."  The  scale  parameter  (b)  is  "a  parameter 
which  determines  the  scale  of  measurement  of  the  fractile  x". 


Finally,  the  shape  par am* tar  (c)  "determines  tha  shape  ... 
of  tha  distribution  function  within  a  family  of  shapas 
associatad  with  a  specified  type  of  variate"  (26:20). 

Kulldorff  and  Van naan  (33:218)  introduced  a  more  general 
form  of  tha  CDF  than  tha  two— parameter  fora  shown  in  aquation 
(11).  By  using  tha  parameter  notation  of  Hastings  and 
Peacock,  and  tha  functional  fora  of  Kulldorff  and  Vannman, 
the  generalized  (three-parameter )  fora  of  the  Pareto  distri¬ 
bution  is  illustrated  in  Figure  1  and  can  be  written  as: 


I  * 


-\-V* 


F(k )  ■  1  -  Cl  +  (x-al/bl-11  for  x  >  a;  b,c  >  0  (IS) 


where  again  a  is  location,  b  is  scale,  and  c  is  shape. 

In  the  special  case  when  a  *  b,  if  welet  k  *  a  *  b 
in  Figure  2,  then  from  equation  (15): 


UJI 


F (x )  -  1  -  Cl  +  (x-ai/bl*^  -  1  -  Cl  +  (x-k)/kl“c 

-  1  -  Cl  +  (x/k)  -  (k/k) 3”c  -  1  -  (1  +  x/k  -  l)~c 
=  1  -  (x/k) ~c  *  1  -  (k/x ) c 


where  k,b,c  >  0  and  x  >  k  =  a.  The  last  expression  is  the 
"usual  formulation"  given  by  equation  (11). 

Another  form  commonly  found  in  the  literature  (26:102; 
51:1)  is  the  one-parameter  form  (Figures  3  and  4)  given  by: 


F(x)  =  1  -  x  c  for  x  >  1;  c  >  0 


«> V*  \>‘J 


u  0» 


density 


One-Parameter  Pareto  Curves  (Eqn  16)  -for  Several 
Values  o-F  Shape  c  with  k  =  a  =  b  =  1. 


Probability  Density  (Eqn  12)  of  the  One-Parameter 
Pareto  with  k  =  1  (Reprinted  from  26:103). 


Equation  (16)  is  simply  a  special  cast  of  (15)  found  by 
setting  a  *  b  *  1.  As  such,  it  represents  the  least  general 
form  of  the  Pareto  distribution. 

The  greater  generality  inherent  in  the  three-parameter 
form,  equation  (15),  allows  the  Pareto  distribution  to  be 
more  useful  in  practical  applications.  For  example,  in  some 
situations  the  random  variable  represented  by  x  may  be 
positive  by  its  very  nature,  making  the  assumption  a  *  0  more 
realistic  than  a  =  b  (33:218).  In  the  special  case. where 
a  =  0,  the  three-parameter  Pareto  distribution  becomes: 

F (x )  *  1  -  Cl  +  (x-a)/b3"c  »  1  -  (1  +  x/b)'~c 
-  1  -  (b/b  +  x/b)~c  *  1  -  C  (x+b> /bl-*1 
*  1  -  Cb/(x+b)lc  »  1  -  bc/C(x+b)c3 

This  last  expression  can  be  written  as  equation  (13)  by 
simply  setting  bc  a  Kj  and  b  =  C. 

Therefore,  equations  (11),  (13),  and  (16)  each  represent 
special  cases  of  the  three-parameter  form  given  by  equation 
(15).  Since  (15)  is  a  more  general  and  hence  more  useful 
form  of  the  Pareto  distribution,  this  thesis  uses  the 
functional  form  in  (15)  to  develop  the  goodness-of-f i t  tests 
for  the  Pareto  distribution.  Selecting  the  more  general  form 
as  a  basis  for  the  test  statistics  will  ensure  the  widest 
possible  application  of  the  goodness-of-f it  tests. 


Parameter  Est i mat i on 

As  explained  in  Chapter  II,  the  development  of  modified 
Kolmogorov-Smirnov,  Anderson-Dar 1 ing,  and  Cramer-von  Mimes 
tests  depends  on  the  use  of  an  invariant  estimator  for  the 
unspecified  location  and  scale  parameters  (38t384).  This 
section  begins  by  briefly  examining  several  published  studies 
on  various  estimation  techniques  for  Pareto  distributions. 

It  concludes  by  discussing  the  best  linear  unbiased  estimator 
(BLUE),  which  is  the  invariant  estimator  used  in  this  thesis. 

Various  Estimators.  The  two  methods  of  invariant 
estimation  most  commonly  used  in  modified  goodness-of  — f i t 
tests  are  the  maximum  likelihood  estimator  (MLE)  and  the  best 
linear  unbiased  estimator  (BLUE).  Various  techniques  for 
estimating  the  parameters  of  the  Pareto  distribution  can  be 
found  in  the  literature.  However,  as  Kulldorff  and  Vannman 
(33:218)  point  out,  few  studies  consider  the  general 
three-parameter  form  of  equation  (IS).  Instead,  most  studies 
consider  only  "special  cases",  such  as  a  *  b,  corresponding 
to  equations  (11)  and  (12). 

Numerous  examples  of  "special  case"  estimators  can  be 
cited.  Moore  and  Harter  (41;  23:69,86)  developed  a  biased, 
single-order — statistic  MLE  for  the  Pareto  shape  parameter 
when  location  is  specified.  Harris  (22:308,  310-311) 
considered  estimation  for  the  two-parameter  form  given  by 


3-13 


‘^Tr 


equation  (12):  "As  a  first  try,  we  can  appeal  to  the 
techniques  of  maximum  likelihood  estimation.  However,  this 
particular  method  does  not  yield  sufficiently  simple 
equations  (for  even  numerical  methods)"  (22:310).  As  a 
result,  Harris  resorted  to  the  method  of  moments  instead. 
Johnson  and  Kotz  (28:234-240)  presented  MLEs  for  the 
two— parameter  form  in  equation  (11),  as  well  as  several  other 
estimation  techniques.  Hastings  and  Peacock  (26:102)  gave 
the  MLE  for  the  one-parameter  form  of  equation  (16).  In  his 
dissertation,  Koutrouvelis  (32:97-115)  attempted  to  estimate 
the  parameters  of  the  upper  tail  of  Pareto  distributions,  but 
found  it  too  difficult  to  calculate  the  Pareto  MLEs,  even 
with  a  computer.  Instead,  he  developed  a  new  method  of 
estimating  parameters  based  on  the  asymptotic  theory  of 
quantiles  using  only  data  consisting  of  sample  values  greater 
than  some  specified  value.  Wingo  (50)  wrote  a  FORTRAN 
program  to  calculate  the  MLEs  from  a  reduced  log-likelihood 
function  for  the  two-parameter  form  in  equation  (12).  Davis 
and  Feldstein  (16:299-300,  305)  developed  MLEs  from 
progressi vely  censored  data  for  the  Pareto  Type  III,  equation 
(14).  Bell,  Ahmad,  Park,  and  Lui  (6:4-7)  presented  the  HLEs, 
the  minimum  variance  unbiased  estimators  (MVUEs) ,  and  the 
minimal  sufficient  statistic  (MSS)  for  the  two-parameter 
form,  equation  (11).  Several  other  estimation  studies  are 
cited  by  Koutrouvelis  (32:55)  and  Johnson  and  Kotz 


►  «"» i."'- 

•« 


i 


i 


(28:235-240) 


Unfortunately,  none  of  these  studies  provide 


the  invariant  estimators  of  the  three— parameter  form  in 
equation  <15)  as  needed  for  this  thesis. 

Parameter  estimation  for  the  general  case  given  by  equa¬ 
tion  (15)  went  virtually  ignored  until  Kulldorff  and  Vannman 
(33)  derived  the  BLUEs  of  the  unknown  parameters  on  the  basis 
of  a  complete  Pareto  sample  with  shape  c  >  2.  In  a  follow-up 
paper,  Vannman  (48)  derived  the  BLUEs  for  shape  c  <.  2. 

Later,  Kaminsky  (29:7-8,  12-14)  and  Kaminsky  and  Nelson 
(30:148)  extended  the  work  of  Kulldorff  and  Vannman  by 
deriving,  for  equation  (15),  the  best  linear  unbiased 
predictors  of  future  observations  from  censored  samples. 

Most  recently,  Charek  (12)  examined  minimum  distance 
estimation  for  the  three-parameter  Pareto. 

Best  Li near  Unbiased  Estimator  (BLUE) .  The  BLUE  derives 
its  name  from  its  main  properties  as  an  estimator.  It  is  a 
"linear"  estimator  because  it  can  be  expressed  as  a  linear 
function  of  a  random  sample.  It  is  "unbiased"  because  its 
bias  term  is  zero;  and  the  expected  value  of  the  estimator  is 
equal  to  the  true  parameter  value.  It  is  considered  the 
"best”  estimator  because  it  has  the  minimum  variance  among 
all  other  linear  unbiased  estimators  (27:227).  However,  for 
the  purposes  of  this  thesis,  the  most  important  property  of 
the  BLUE  is  invariance  under  transf ormation  of  parameters. 

The  BLUE  is  a  subset  of  a  larger  class  of  estimators 


known  as  least-squares  estimators.  In  general,  least  squares 
estimators  do  not  possess  the  invariance  property.  However, 
when  a  least-squares  estimator  is  also  a  linear  -Function, 
then  the  invariance  property  holds  (40:349-350).  Therefore, 
in  addition  to  its  other  properties,  the  BLUE  is  also  an 
invariant  estimator.  It  is  this  property  of  invariance  under 
parameter  transf ormations  that  allowed,  for  example.  Green 
and  Hegazy  (19:205)  and  Woodbury  (52)  to  use  the  BLUE  in 
producing  modified  goodness-of-f i t  tests  based  on  the 
findings  of  David  and  Johnson  (14). 

Intuitively,  the  property  of  invariance  implies,  for 
example,  that  if  a  parameter  0  is  estimated,  and  is  also 
estimated  from  the  same  data,  then  the  estimate  of  42  should 
be  the  square  of  the  estimate  of  4  (37:434).  Generally,  the 
invariance  property  requires  that  if  f(4)  is  a  single  valued 

A  A 

function  of  a  parameter  4,  and  4  is  the  BLUE  of  4,  then  f (4) 

a  A 

is  the  BLUE  of  f(4),  i.e.  ,  f(4)  *  f(4)  (8:94). 

The  studies  by  Kulldorff  and  Vannman  (33;  48)  derived 
the  BLUEs  of  equation  (15)  for  b  when  a  and  c  are  known;  for 
a  when  b  and  c  are  known;  and  for  a  and  b  when  c  is  known. 

The  last  case,  which  corresponds  to  invariant  estimation  of 
location  and  scale  when  shape  is  known,  is  used  in  this 
thesis  to  develop  the  modified  K-S,  A-D,  and  C-VM  tests.  The 
next  two  subsections  use  the  findings  of  Kuldorff  and  Vannman 
to  derive  computational  forms  of  the  BLUEs  for  the  Pareto 
location  and  scale  parameters,  assuming  shape  is  known. 


BLUE*  for  Shape  c  >  2.  For  tha  case  where  c  >  2, 
Kulldorff  and  Vannman  (33i 224-226)  found  that  the  BLUEs  of 
location  a  and  scale  b  can  ba  written  in  terms  of  the 
specified  shape  parameter  c  and  the  order  statistics  (15:4) 


*(1)  ^  x(2)  1 


<  where  is  the  smallest  and  x (n 


the  largest  value  in  the  observed  random  sample  of  size  n. 
Thus  the  BLUEs  for  a  and  b  are,  respecti  vely: 


-  Y/C (nc-1) (nc-2)-ncD3 


Y(nc-l)  /  C (nc-l> (nc-2)-ncD3 


Cx  t jj-al Cnc-1) 


In  the  special  case  when  it  is  known  that  a  *  b,  as  in 
equation  (11),  the  BLUE  reduces  to: 


Cl  -  l/(nc)Tx(1j 


However,  before  equations  (17)  and  (18)  can  be  used  to  find 
the  BLUEs  for  the  general  case,  the  following  terms  must 
first  be  calculated: 


r< n-i+1)  r (n+l-2/c) 
r(n-i  +  l-2/c)  r< n+1) 


for  i  *  1,2, 


n-1 

(c+1)  I 


+  (c-1 ) B_ 


Y  -  <c+l)  iSi  Bj  x(i)  +  <c-l)Bn  x  (n)  -  Dx(1)  (22) 

After  these  values  are  calculated,  they  can  be  substituted 

A  A 

into  equations  (17)  and  (18)  to  find  the  BLUEs  a  and  b. 

From  equations  (17)  to  (22),  it  is  obvious  that  the  use 

A  A 

of  the  BLUEs  a  and  b  involves  the  computation  of  all  the 
coefficients  B^  for  i  *  1,2, ***,n.  Therefore,  in  order  to 
derive  a  computational  form  of  the  BLUEs,  the  first  task  is 
to  simplify  equation  (20).  Each  Bj  is  the  ratio  of  a  product 
of  gamma  functions.  Banks  and  Carson  (5)  note  that  "the 
gamma  function  can  be  thought  of  as  a  generalization  of  the 
factorial  nation  which  applies  to  all  positive  numbers,  not 
just  integers"  (5:144).  For  any  real  m  >  0: 

T(m)  -  (m-1 )  r(m-l)  (23) 

By  definition  1(1)  =  1,  so  that  whenever  m  is  an  integer, 
equation  (23)  becomes: 

r (m>  =  (m-1) !  (24) 

Applying  these  gamma  definitions  in  equation  (20)  reveals: 


Pin-1+1)  Pi n+l-2/c) 
P  (n— 1+1— 2/c)  r<n+l) 


Pin)  Pin+  1-2/c) 
r< n-2/c)  T(n+1) 


(n-1) !  <n-2/c)  r(n-2/c) 
n(n-l)  !  Pi n-2/c) 


n-2/c 

n 


=  1  -  2/ (cn) 


Similarly,  B2  is  -found  from  equation  (20)  as  follows: 


Pi n-2+1)  r< n+l-2/c) 
r(n-2+l-2/c)  rtn+l) 


r<n-l)  r(n+l-2/c) 
r<n-l-2/c)  T(n+1) 


in-2) !  i n-2/c)  Pin-2/c) 
r  (n-l-2/c)  n! 


(n-2)  !  (n-2/c)  <n-l-2/c>  .T(n-l-2/c) 
n(n-l) (n-2) !  rCn-l-2/c) 


<n— 2/c) (n-1 -2/c) 
n (n— 1 ) 


Cl  -  2/ (cn) 3  Cl  -  2/c (n-1) 3 


Continuing  in  this  manner,  it  turns  out  that: 


B_ 


Cl  -  2/ (cn) 3C 1  -  2/c(n-l) 3‘ • • Cl  -  2/c(l)3 


The  calculations  can  be  simplified  as  follows: 
Let  gj  *  2/  (cn),  g2  ■  2/  Cc(n-1)3,  ***  ,  gn  *  2> 
Also  let  bj  *  l-glf  b2  *  1-92* 


-w  4  n 


f 


*>n  " 


Then  Bj  =  bj,  B2  ■  bjb2,  - ,  ®n  “  bl*s2***bn* 

In  general,  then,  each  B^  can  be  expressed  in  computational 
■form  as: 

i 

B*  *  n  b_.  <28) 

where  b^  »  1  -  and  g^  ■  2/  c(n-j+l)  for  j  =  1,2, *  •  * ,  i  . 

From  these  results,  if  we  let  Bq  *  1,  then  another  way  to 
write  B^  is  (48:705): 

Bt  *  Cl  -  2/  c  (n— i+1 )  1  Bi_1  for  i  =  l,2,*“,n  (29) 

As  mentioned  earlier,  once  all  of  the  Bj  are  computed  from 
equation  (28)  or  (29),  then  D  and  Y.  can  be  computed  from 
equations  (21)  and  (22).  Finally,  these  values  for  B^ ,  D, 
and  Y  are  substituted  into  equations  (17)  and  (18)  to  find 
the  BLLNEs  a  and  b. 

BLUEs  for  Shape  c  <_  2.  For  the  case  where  c  <,  2,  the 
variance  of  the  Pareto  distribution  does  not  exist,  so  a 
different  approach  must  be  used  to  derive  the  BLUEs.  In  this 
case,  Vannman  (48:706-707)  found  that  the  BLUEs  of  loc.’don  a 
and  scale  b  can  still  be  found  provided  that  shape  c  satis¬ 
fies  2/n  <  c  <_  2  ,  where  again  n  is  the  sample  size.  Here 
the  BLUEs  a^*  and  b^*  are  based  on  the  first  k  order 
statistics  only,  where  k  is  chosen  so  that  2  <  k  <  n+l-2/c: 


ak  "  x(l)  " 


(30) 


bk*/  (nc-1) 

and 

bk*  -  (1/Uk>  C  (c+1 )  Bt  x(i) 

+  C(n-k+l)c  -13  Bk  x  ^ k ^ 

-  C (nc-1 ) / (nc) 3  <nc-2-Uk)  x^jj  >  (31) 

where 

(nc-2) (nc-c-2)  -  ncE(n-k)c  -23  Bk 

Uk  =»  -  (32) 

(nc-l)(c+2) 

Whenever  possible,  k  should  be  chosen  to  achieve  highest 
eff iciency,  which  occurs  when  k  =  n  -  C2/c3,  where  ,,C2/c3" 
denotes  the  integer  portion  of  2/c.  Vannman  (48s 707)  also 
paints  out  that  in  the  case  where  2/c  is  an  integer,  and  k  is 
selected  for  highest  efficiency  so  that  k  *  n  -  2/c,  then 
equation  (31)  can  be  simplified  tos 


(c+1) (c+2) (nc-1)  n-2/c 

-  L  r  Bi  xri> 

(nc-2) (nc-c-2)  i=l  1  41 ' 


nc-2 

X(l> 


3 


(33) 

By  substituting  this  result  for  bk*  in  equation  (30),  the 
BLUE  for  a,  based  on  the  first  n-2/c  order  statistics,  can  be 
written  in  the  following  computational  form: 


(c+l>(c+2>  n-2/c  nc-2 

<nc-2)  (nc-c-2)  i*l  1  u>  c+2 


Once  a^*  has  been  computed,  it  is  easy  to  use  equation 
to  find  a  computational  form  of  the  BLUE  for  b: 


*  (1)  3 

(34) 

(30) 


bn- 


h-2/c 


(nc-1)  (x 


(1) 


(35) 


Equations  (34)  and  (35)  give  the  BLUEs  for  location  a 
and  scale  b  provided  all  of  the  following  conditions  apply: 

1)  shape  parameter  c  is  specif ied 

2)  2/n  <  c  <  2 

3)  2/c  is  an  integer 

When  sample  size  n  =  5,  10,  15,  20,  25,  or  30,  then  all  three 
of  these  conditions  hold  for  shape  parameter  c  «  .5,  1,  or  2. 
Therefore,  for  these  values  of  n  and  c,  it  appears  that 
equations  (34)  and  (35)  apply.  There  is,  however,  one 
important  exception.  As  explained  earlier,  k  must  be  chosen 
so  that  2  <.  k  <  n+l-2/c.  In  the  case  wher  e  n  =  5  and  c  =* 

.5,  notice  that  n+l-2/c  =  2.  Thus  k  cannot  be  selected  as 
before,  since  it  would  need  to  satisfy  2  <  k  <  2,  which  is 
not  possible.  As  a  result,  the  above  equations  fail  to 
provide  BLUEs  for  the  special  case  c  =  .5  and  n  =  5;  thus, 
when  c  =  .5,  this  thesis  will  use  n  =  6  instead  of  n  =  5. 

As  explained  in  the  next  chapter,  this  thesis  uses 
sample  sizes  of  n  =  5,  10,  15,  20,  25,  and  30,  with  shape 


3-22 


i 


parameters  of  c  =>  .5,  1,  1.5,  2,  2.5,  3,  3.5,  and  4.  The 
preceding  subsection  presented  the  BLUEs  to  be  used  for  c  = 
2.5,  3,  3.5,  and  4.  This  subsection  has  thus  far  shown  that 
equations  (34)  and  (35)  provide  the  BLUEs  for  c  =  .5,  1,  and 
2,  except  for  the  special  case  c  =  .5  and  n  =  5.  The  one 
remaining  case  to  be  addressed  is  when  c  =  1.5. 

When  the  shape  parameter  c  =*  1.5,  equations  (34)  and 
(35)  do  not  apply  since  condition  3)  fails  to  hold,  i.e. ,  2/ 
is  not  an  integer.  To  ensure  highest  efficiency,  k  is 
selected  so  that  k  *  n  -  C2/c],  where  " C2/c3”  denotes  the 
integer  portion  of  2/c.  Thus: 

k  *  n  -  C2/c3  =  n  -  Cl. 3333  =  n  -  1  (36) 

According  to  Vannman  (48:707),  substituting  this  value  of  k 
into  equations  (30)  to  (32)  gives  the  desired  BLUEs: 

»k*  *  an-l*  =  *(1)  -  bn_1*/(nc-l)  (37 


(nc-1 ) (c+2) 


(39 


Summary  of  BLUEs .  For  shape  parameter  c  =  .5,  1,  or  2, 
this  thesis  uses  equations  (34)  and  (35)  to  calculate  the 
BLUEs  for  location  parameter  a  and  scale  parameter  b; 
however,  the  case  c  =  .5  and  n  =  5  is  omitted,  since  then 
the  BLUEs  cannot  be  found.  When  c  =  1.5,  the  BLUEs  are  given 
by  equations  (37)  to  (39).  For  c  =  2.5,  3,  3.5,  or  4, 
equations  (17),  (18),  (21),  (22)  and  (29)  are  used  to 

calculate  the  BLUEs  for  a  and  b.  Once  the  BLUEs  have  been 
computed,  the  K-S,  A-D  and  C-VM  test  statistics  can  be 
modified  to  accomodate  unspecified  location  and  scale 
parameters.  An  example  will  help  to  illustrate  the 
calculations  involved. 

Example  1.  In  Table  I  the  data  listed  under  the  xA 
column  was  generated  from  a  Pareto  distribution  of  shape 
parameter  c  =  2.5,  using  equation  (47)  in  the  next  chapter. 

A  A 

Suppose  it  is  desired  to  find  the  BLUE  estimators  a  and  b 
based  on  this  particular  random  sample  of  size  n  =  10.  Since 
in  this  case  it  is  known  that  c  =  2.5,  the  BLUEs  will  be 
computed  from  equations  (17)  and  (18).  One  procedure  to 
accomplish  this  is  as  follows: 

Step  1.  Arrange  the  x^  sample  values  in  order  from 
smallest  to  largest.  The  resulting  order  statistics  (20:70) 
are  listed  under  the  x(jj  column  of  Table  I. 


Table  I 


CALCULATION  OF  BLUES 


i 

*i  *(i) 

ci 

Bi-1 

Bi 

Bix(i) 

1 

1 . 7986  1 . 0095 

.  9200 

1 . 0000 

.9200 

.9287 

2 

1 . 0684  1 . 0586 

.9111 

.9200 

.8382 

.8873 

3 

1 . 3725  1 . 0684 

.  9000 

.8382 

.7544 

.8060 

4 

1.1779  1.1267 

.8857 

.7544 

.6682 

.7529 

5 

1.4743  1.1779 

.8667 

.6682 

.5791 

.6821 

6 

1 . 0095  1 . 3725 

.8400 

.5791 

.4864 

.6676 

7 

4.8304  1.4743 

.8000 

.4864 

.3891 

.5737 

8 

1 . 0586  1 . 7986 

.7333 

.3891 

.2854 

.5133 

9 

1.1267  3.9974 

.6000 

.2854 

.  1712 

.6844 

10 

3.9974  4.8304 

.2000 

.  1712 

.0342 

.  1652 

D 

n— 1 

=  (c+1)  I  Bj  +  (c 
i  =  l  1 

-l)Bn  = 

17.8733 

Y 

n— 1 

*  (c+1 )  X  Bj  x  , 

i»l  *  <*' 

+  (c-l)Bn  x(n)  - 

Dx(l>  * 

4.9407 

A 

a 

=  x ( i )  -  Y/C (nc-1) (nc-2)- 

ncDJ  =  .9625 

A 

b 

A 

38  (x  (  1  )  ) 

=  1.128 

Step  2.  Compute  each  Bj  -for  i  =  l,2,  ***,n  using  equation 
(29) .  Thus: 

For  i =1 ,  Bj  =  Cl-2/2. 5(10-1  +  1)  !JB0  =  ( 1-2/25. O) ( 1 . OOO)  =  .920 
For  i  =2,  B2  =*  C  1-2/2. 5  (10-2+1 )  DBj  =  ( 1-2/22.  5)  (.  9200)  *  .838 


C 1-2/2. 5 ( 10-2+1 ) IBj  =  (1-2/22.5) (.9200)  *  .838 


For  i=10,  Bin= 


11-2/2.5(10-10+1)  3B, 


(1-2/2. 5) (.1712) 


034 


-  “V 


Table  I  lists  all  of  the  values  of  Cj  *  1  —  2/c(n— i+l>  and  Bj 
=  as  computed  -from  equation  (29). 

Step  3.  Use  the  B^  to  compute  D  -from  equation  (21): 


D  *  (c+1) (B1+B2+***+B9)  +  (c-l)B10 

*  (2.5  +  1 )  ( . 9200+  .  8382-*- *  *  *  +  .1712)  +  (2.5  -  1>(.0342> 
=  (3.5) (5.092)  +  (1.5) (.0342)  *  17.8733 


Step  4.  Use  the  x(i),  D,  and  B^  values  to  compute  Y 
■from  equation  (22).  Table  I  lists  the  values  of 


—  (c+1)  [BjX  ®2X  (2)  +  ^9^  (9)  ^  (c— ^®iox(10)  — ®x(l) 

*  (3.5)  (.9287+  .8873+  *“+  .6844) 


+  (1.5) (.1652)  -  17.8733(1.0095) 

=  (3.5) (6.496)  +  .2478  -  18.0431  *  4.9407 


Step  5.  Use  Y  and  D  to  compute  a  from  equation  (17) 


a  =  -  Y/C (nc-1 ) (nc-2) -ncD3 

=  1.0095-  (4.9407) /C (25-1) (25-2)  -25(17.8733)3 
=  1.0095  -  4.9407/105.1675  =  .9625 


A  A 

Step  6.  Use  a  to  compute  b  from  equation  (18): 


b  *  (x (1)— a) (nc-1)  =  (1.0095  - 


.9625) (25  -  1)  =  1. 128 


/.  ,\v. **Vr‘  j 


3-26 


i  nr 


In  this  example,  than,  the  BLUEs  -for  a  and  b  are  a  *  .9625 

A 

and  b  *  1.128.  (The  values  were  actually  generated  -from  a 
Pareto  distribution  with  a  »  b  ■  1  and  c  =  2.5).  Once  the 
BLUEs  have  been  computed,  the  test  statistics  can  be 
appropriately  modified. 

Modified  Test  Statistics 

At  the  end  of  Chapter  II,  the  standard  forms  of  the 
Kolmogorov-Smirnov  <K-S) ,  Anderson-Darl ing  (A-D),  and 
Cramer — von  Mises  (C-VM)  test  statistics  were  presented.  To 
use  these  "unmodified"  statistics  with  their  existing 
critical  value  tables,  all  parameters  must  be  specified. 

When  unknown  location  and  scale  parameters  are  involved,  the 
test  statistics  must  be  modified  to  generate  new  critical 
value  tables  before  they  will  produce  accurate  results.  This 
section  shows  how  to  calculate  the  modified  test  statistics 
using  an  ordered  sample  and  the  BLUEs  described  in  the 
preceding  section.  The  notation  and  approach  are  adapted 
from  Littell,  McClave,  and  Of fen  (36:259-260). 

Hypothesi zed  Pareto  CDF.  Before  computing  the  modified 
test  statistics,  the  hypothesized  Pareto  CDF  must  be  calcu¬ 
lated  for  each  value  of  the  random  sample.  Let  *i>x2'’*’'xn 
be  a  random  sample  from  the  Pareto  distribution  with  unknown 
location  and  scale  parameters  a  and  b,  and  known  shape  c; 
and  let  x^j  denote  the  ith  order  statistic  (20:70).  The 


L\  . 

Jt  « 


t  Jt 


L-Ji 


3-27 


appropriate  BLUEs  -for  location  a  and  scale  b  (computed  -from 
the  previous  section),  the  specified  shape  c,  and  the  n 
ordered  Pareto  deviates,  x^j,  are  substituted  into  equation 
(IS)  to  calculate  the  hypothesized  Pare to  CDF: 

Pj  **  F(x  (£ )  $a,b,c)  *  1  —  Ct  +  (x  )~a)  /bl“c  (40) 

for  i  =  1,2,  ***  ,n.  Note  that  for  a  given  shape  c  (e.g., 
c=2.5  or  c*4)  and  sample  size  n  (e.g.,  n=10  or  n-30) ,  a 
specific,  fixed  pair  of  location  and  scale  values  (e.g., 
a=b=*l  or  a=0,  b=l )  is  used  to  produce  the  random  Pareto 
deviates  needed  to  compute  the  hypothesized  CDF.  This  can  be 
done  without  loss  of  generality  because,  as  discussed  in 
Chapter  II,  the  use  of  invariant  estimators  (in  this  case  the 
BLUEs)  for  location  and  scale  ensures  that  the  distribution 
of  the  test  statistic  depends  only  on  the  shape  c  and  sample 
size  n,  and  is  independent  of  location  and  scale  (36:260). 

Example  2.  In  Example  1,  the  BLUEs  for  location  a 
and  scale  b  were  found  from  a  sample  of  size  n=*10  generated 
from  a  Pareto  distribution  having  shape  c*2.5.  In  this 
example,  the  same  sample  of  values  x j , x^, '  "  , x iq  will  be  used 
to  compute  the  hypothesized  Pareto  CDF  from  equation  (40). 
Table  II  contains  the  values  obtained  while  making  the 
calculations.  The  columns  for  xi  and  x(i)  are  duplicated 

A  A 

.  The  BLUEs  a  and  b  are  as  derived  in  Example  1. 


from  Table  I 


Table  II 


CALCULATION  OF  HYPOTHESIZED  PARETO  CDF 


1 . 7986 

1 . 0095 

.0470 

.0417  .91 

1 . 0684 

1.0586 

.0961 

.0852  .8 

1.3725 

1 . 0684 

.  1059 

.  0939  . T 

1 . 1779 

1 . 1267 

.1642 

.1456  .7 

1 . 4743 

1.1779 

.2154 

.1910  .6- 

1.0095 

1.3725 

.4100 

.  3635  .  4< 

4.8304 

1 . 4743 

.5118 

.  4537  . 3' 

1.0586 

1.7986 

.8361 

.7412  .21 

1 . 1267 

3. 9974 

3.0349 

2.6905  .0: 

3. 9974 

4. 8304 

3. 8679 

3.4290  .0: 

N*  =*  M*  /  b  *  Mj  /  1.128 
Ot  -  (1  +  Ni)*41  =  (1  +  N±) 
Hypothesized  Pareto  CDF:  P 


Modified  K-5  Statistic.  After  computing  all  n  of  the 
values  of  Pj  from  equation  (40),  the  modified  Kolmogorov— 
Smirnov  test  statistic  is  found  from  equation  (4)  by 
substituting  P^  in  place  of  z^  in  equation  (5).  Thus  the 
modified  test  statistic  in  computational  form  is: 


D  *  max  <D+«  D~) 


where 


sup  C  < i /n ) -Pi  3  and  D~  * 


sup  CPi-(i-l)/n3 


<i 


D 


<A  O 


Table  III 


CALCULATION  OF  MODIFIED  K-S  TEST  STATISTIC 


i 

x<i> 

Pi 

i/n  (i-l)/n 

Di  + 

asssaaa 

Di~ 

1 

1.0095 

.0970 

.  1  .0 

.0030 

.0970 

2 

1.0586 

.  1849 

.2  .1 

.0151 

.0849 

3 

1 . 0684 

.2010 

.3  .2 

.0990 

.0010 

4 

1. 1267 

.2881 

.4  .3 

.1119 

-.0119 

5 

1 . 1779 

.3540 

.5  .4 

(.1460) 

-.0460 

6 

1.3725 

.5393 

.6  .5 

.0607 

.0393 

7 

1 . 4743 

.6075 

.7  .6 

.0925 

.0075 

8 

1 . 7986 

.7500 

.8  .7 

.0500 

.0500 

9 

3.9974 

.9618 

.9  .8 

-.0618 

(.1618) 

10 

4.8304 

.9758 

1.0  .9 

.0242 

.0758 

Di  +  = 

(i/n)  —  P^  = 

i/10  -  Pj 

D+  »  sup  Ca/nlH^H  =  .1460 

a 

1 

II 

Pi  -<i-l)/n 

=  Pi  -ti-n/io 

d~  *  sup  rPi 

-(i-D/nJ  -  . 

1618 

K-S  Statistics  D 

»  max  (D+,  D~> 

=>  .  1618 

*"•  ssss  —  s> 

Example  3.  Once  the  hypothesized  Pareto  CDF  is 
computed,  the  values  can  be  used  to  calculate  the  modified 
K-S  test  statistic.  Table  III  continues  the  previous 
examples  by  shotting  the  computations  involved  in  calculating 
the  modified  K-S  test  statistic.  As  before,  the  calculations 
are  based  on  the  ns10  order  statistics  introduced  in  example 
1,  and  the  values  of  the  hypothesized  Pareto  CDF  as 
computed  in  example  2. 


3-30 


Table  IV 

CALCULATION  OF  MODIFIED  A-D  TE8T  STATISTIC 


1 

.0970 

.9758 

-2. 3330 

-3.7214 

-6.0544 

-6. 0544 

2 

.  1849 

.9618 

-1.6879 

-3.2649 

-4. 9528 

-14.8584 

3 

.2010 

.7500 

-1.6045 

-1.3863 

-2. 9908 

-14.9540 

4 

.2881 

.6075 

-1.2444 

-. 9352 

-2. 1796 

-15.2572 

5 

.3540 

.5393 

-1.0385 

-.7750 

-1.8135 

-16.3215 

6 

.5393 

.3540 

-.6175 

-.4370 

-1.0545 

-11.5995 

7 

.6075 

.2881 

-.4984 

-. 3398 

-.8382 

-10.8966 

8 

.7500 

.2010 

-.2877 

-.2244 

-.5121 

-7.6815 

9 

.9618 

.  1849 

-.0389 

-.2040 

-.2429 

-4. 1293 

10 

.9758 

.0970 

-.0245 

-. 1020 

-. 1265 

-2.4035 

n 

X  <2 

j  — 1 )  N  .  = 

-104. 1559 

J  J 

B 

*  In  P. 

J 

M.  =  In  <  1 

j 

-Pn+l-j) 

N  . 

J 

»  L  .  +  M  . 

j  j 

A2  *  -n 


-n  -  (1/n)  X  <2j-l ) tin  P.  +  ln<l-P_..  .  >1 

j=*l  j  n+l-j 

-  -10  -  <1/10) <-104. 1559)  =  .4156 


I  J 


fc  r< 


t ;  * 


Modi-f  ied  A-D  Statistic.  The  modified  Anderson-Darl  i  ng 
test  statistic  is  computed  by  substituting  P^  from  equation 
<40)  in  place  of  in  equation  <9) .  Thus  the  computational 
form  of  the  modified  A-D  test  statistic  is: 


i,  j 


A2  »  - n  -  (1/n)  I  (2j-l) Cln  P. 


+  1  n  <  1  -P, 


n+l-j  >3 


Example  4.  Table  IV  shows  the  calculations 
involved  in  finding  the  value  of  the  modified  A-D  test 
statistic.  The  P  values  are  as  computed  in  example  2. 


t  « 


r  -1 


3-31 


Table  V 


CALCULATION  OF  MODIFIED  C-VM  TEST  STATISTIC 


i 

F  j 

2j~ 1 

p.  _  <2^ 

n  rr  <2^-1)32 

2n 

Pj  2n 

CPj  2n  3 

1 

.0970 

.05 

.0470 

.0022 

2 

.1849 

.  15 

.0349 

.0012 

3 

.2010 

.25 

-.0490 

.0024 

4 

.2881 

.35 

-.0619 

.0038 

5 

.3340 

.45 

-. 0960 

.0092 

6 

.5393 

.55 

-.0107 

.0001 

7 

.6075 

.65 

-.0425 

.0018 

8 

.7500 

.75 

.0000 

-OOOO 

9 

.9618 

.85 

.1118 

.0125 

10 

.9758 

.95 

.0258 

.0007 

X  =  . 0339 

j=l 

W 2  * 

C 1/ ( 12n) : 

n 

+  I  (P.  - 
j*l  ■» 

(2j-l)/2nl2 

= 

(1/120) 

+  .0339  - 

.0423 

Modified  C— VM  Statistic.  The  computational  form  of 
the  modified  Cramer — von  Mises  test  statistic  is  found  from 
equation  (7)  by  substituting  P^  for  z^s 


W2  -  C 1/  <  12n)  3  +  I  CP.  -  (2j-l)/2nl2  <- 

j=l  J 

Ex ample  5.  Table  V  shows  the  calculations 
involved  in  finding  the  value  of  the  modified  C-VM  test 
statistic.  The  P.  values  are  as  computed  in  example  2. 


Chapter  Summary 

Several  applications  for  the  Pareto  distribution  have 
been  found  in  econonics  and  operations  research.  It  has 
played  a  major  role  in  investigating  the  distributions  of 
city  population  size,  natural  resources,  stock  price 
fluctuations,  and  oil  field  locations.  Other  studies  shot* 
the  Pareto  can  be  used  to  model  phenomena  which  may  apply  to 
Air  Force  interests,  such  as  time-to— f ailure  of  equipment 
components,  maintenance  service  times,  nuclear  fallout 
dispersion,  and  error  clusters  in  communications  circuits. 

There  are  three  basic  forms  of  the  Pareto  distribution, 
each  of  which  is  a  special  case  of  the  three-parameter  form. 
The  greater  generality  of  the  three-parameter  form  allows  the 
Pareto  distribution  to  be  more  useful  in  practical  applica¬ 
tion.  Various  methods  have  been  explored  for  estimation  of 
Pareto  parameters;  but  the  best  linear  unbiased  estimator 
(BLUE)  is  the  only  estimator  known  to  possess  the  required 
invariance  property  for  the  three-parameter  form. 

For  shape  parameter  c  *  .5,  1,  or  2,  the  BLUEs  are 
computed  from  equations  (34)  and  (35).  When  c  =  1.5,  the 
BLUEs  are  given  by  equations  (37)  to  (39).  For  c  =  2.5,  3, 
3.5,  or  4,  the  BLUEs  are  computed  from  equations  (17),  (18), 

(21),  (22),  and  (29).  The  BLUEs  are  used  to  compute  the 
hypothesized  distribution  function  from  equation  (40).  The 
modified  K-S,  A-D,  and  C-VM  test  statistics  can  then  be  found 


using  the  methods  presented  in  the  next  chapter. 


Chapter  Over vi ew 

This  chapter  describes  the  basic  principles  and  specific 
procedures  used  to  satisfy  the  research  objectives  of  this 
thesis.  Foremost  is  the  Monte  Carlo  method  used  to  generate 
the  critical  value  tables  of  the  modified  K-S,  A-D,  and  C-VM 
goodness-of — f i t  tests  for  the  three -parameter  Pareto 
distribution  when  only  the  shape  parameter  is  specified. 

Basic  Principles 

This  section  deals  with  some  of  the  basic  principles 
used  to  generate  critical  values.  It  begins  with  an  overview 
of  the  Monte  Carlo  method  in  general.  Next  is  discussed  the 
inverse  transform  technique  used  to  generate  random  Pareto 
deviates.  Then  the  selection  of  critical  values  is 
discussed.  Finally,  the  use  of  plotting  positions  to 
determine  percentiles  is  explained. 

The  Monte  Carlo  Method.  Mathematics  can  be  divided 
into  theoretical  and  experimental  categories.  The  primary 
distinction  is  that  "theoreticians  deduce  conclusions  from 
postulates,  whereas  experimentalists  infer  conclusions  from 
observations"  (21:1).  The  Monte  Carlo  method  is  a  branch  of 


experimental  mathematics  involving  experiments  using  random 


numbers.  It  has  been  used  extensively  in  statistical 
analysis,  operational  research,  nuclear  physics,  and  several 
other  fields  where  there  are  problems  not  easily  solved  by 
theoretical  mathematics  alone  (21:2). 

An  important  feature  of  the  Monte  Carlo  method  is  its 
usual  reliance  on  computers  to  simulate  random  processes 
(10x2).  Also  known  as  the  method  of  statistical  trials,  it 
is  basically  a  system  of  techniques  which  allows  the  modeling 
of  random  processes  conveniently  by  digital  computer.  Before 
the  advent  of  the  computer,  a  study  of  a  random  process  was 
considered  to  be  complete  when  it  was  reduced  to  an  analyti¬ 
cal  description.  The  computer  has  now  made  it  convenient  in 
many  cases  to  salve  an  analytical  problem  by  reducing  it  to  a 
random  process  and  then  simulating  that  process  (10:vii). 

Thus  a  basic  principle  of  the  method  involves  simulating 
statistical  experiments  through  computati onal  techniques,  and 
then  analysing  numerical  characteristics  observed  from  these 
experiments  (10:ix).  For  this  reason,  the  Monte  Carlo  method 
can  be  defined  as  "the  construction  of  an  artificial  random 
process  possessing  all  the  necessary  properties,  but  which  is 
in  principle  realizable  by  means  of  ordinary  computational 
apparatus"  (10x2). 

The  Monte  Carlo  method  is  typically  used  to  solve 
problems  of  two  basic  types.  A  deterministic  problem  has  no 
direct  association  with  random  processes.  In  this  case  the 
Monte  Carlo  method  is  often  used  when  the  problem  can  be 


formulated  in  theoretical  language  but  cannot  be  solved  by 
theoretical  means.  Usually  the  approach  is  to  recognize  the 
underlying  problem  structure  as  resembling  some  apparently 
unrelated  random  process,  and  then  solve  the  deterministic 
problem  numerically  by  an  appropriate  Monte  Carlo  simulation. 

In  the  case  of  a  probabilistic  problem,  the  Monte  Carlo 
method  is  directly  concerned  with  the  behavior  and  outcome  of 
random  processes.  The  approach  is  to  observe  random 
variates,  chosen  so  that  they  directly  simulate  the  physical 
random  processes  of  the  original  problem.  The  desired 
solution  is  then  inferred  from  the  behavior  of  the  random 
numbers  (21:2-4).  The  latter  Monte  Carlo  approach  was  used 
in  this  thesis  to  generate  the  critical  value  tables  for  the 
goodness-of-f i t  tests. 

The  main  weakness  in  the  Monte  Carlo  method  is  that  the 
answers  it  produces  are  to  some  degree  uncertain  since  they 
are  inferred  from  raw  observational  data  consisting  of  random 
numbers.  This  weakness  must  be  accounted  for  because: 


Whenever  one  is  inferring  general  laws  on  the 
basis  of  particular  observations  associated  with 
them,  the  conclusions  are  uncertain  inasmuch  as 
the  particular  observations  are  only  a  more  or 
less  representati ve  sample  from  the  totality  of 
all  observations  which  might  have  been  made. 

Good  experimentation  tries  to  ensure  that  the 
sample  shall  be  more  rather  than  less  representa¬ 
tive  ...  C Monte  Carlo  answers!  can  nevertheless 
serve  a  useful  purpose  if  we  can  manage  to  make 
the  uncertainty  fairly  negligible,  that  is  to  say 
to  make  it  unlikely  that  the  answers  are  wrong  by 
very  much  [21:4-5!. 


Thus  there  is  usually  no  cause  for  concern  if  the  uncertainty 
is  negligible  for  practical  purposes. 

One  May  of  reducing  uncertainty  is  to  base  the  Monte 
Carlo  analysis  on  a  larger  number  of  observations.  However, 
economic  and  time  constraints  must  be  considered.  “Broadly 
speaking,  there  is  a  square  law  relationship  between  the 
error  in  an  answer  and  the  requisite  number  of  observations; 
to  reduce  it  tenfold  calls  for  a  hundredfold  increase  in  the 
observations,  and  so  on"  (21:5).  Therefore,  to  avoid  using 
an  inordinate  amount  of  computer  time,  and  to  conserve 
financial  resources,  this  thesis  follows  the  common  practice 
(9; 43; 49; 52; 54)  of  using  5000  repetitions  rather  than,  say, 
10000  in  performing  the  Monte  Carlo  analysis. 

The  Inverse  Transf orm  Technique.  To  apply  the  Monte 
Carlo  method  to  the  problem  at  hand  requires  random  samples 
from  the  Pareto  distribution.  The  most  practical  way  to 
obtain  such  samples  is  to  use  a  computer  program  to  produce  a 
group  of  n  numbers  that  seem  to  come  from  a  Pareto  popula¬ 
tion.  In  terminology  adapted  from  Conover  (13:323-324,360), 
these  n  numbers  are  called  “random  Pareto  deviates"  because 
they  are  deliberately  generated  to  resemble  observations  on 
independent  Pareto  random  variables.  Previous  AFIT  theses 
(9; 43; 49;  etc.)  involved  distributions  for  which  computer 
programs  to  generate  random  samples  were  already  available 
from  the  International  Mathematical  Statistics  Library 


(IMSL).  IMSL  does  not  contain  a  similar  subroutine  for  the 
Pareto  distribution;  therefore,  a  computer  program  needed  to 
be  written  to  generate  random  Pareto  deviates. 

One  common  method  of  using  a  computer  to  generate  ran¬ 
dom  samples  from  a  given  distribution  is  to  first  generate  a 
uniform  random  sample  on  (0,1)  and  then  transform  it  into  a 
new  sample  having  the  desired  distribution.  This  method, 
called  the  inverse  transform  technique,  uses  the  fact  that 
the  random  variable  R  =  F(X)  is  uniformly  distributed  on 
(0,1),  where  X  is  a  random  variate  (5:293-298).  Thus,  every 
variate  is  related  to  the  uniform  variate  on  (0,1)  through 
its  own  inverse  distribution  function  (26:22).  Therefore,  a 
set  of  uniformly  distributed  random  numbers  is  required  to 
generate  a  random  sample  from  the  Pareto  distribution. 

Conveniently,  most  random  number  generators  are 
designed  to  generate  random  numbers  which  are  uniformly 
distributed  on  the  interval  (0,1)  (5:293).  Hence,  the 

inverse  transform  technique  can  be  directly  applied  to  a  set 
of  these  random  numbers  to  generate  random  Pareto  deviates. 
However,  the  technique  requires  that  for  each  random  number 
r,  the  equation  r  -  F(x)  must  be  solved  for  the  correspond¬ 
ing  value  of  x  =  F~* (r) .  Therefore  the  technique  is 
practical  only  when  the  CDF  F(x>  has  an  inverse  which  can  be 
computed  explicitly  (5:294).  Fortunately,  the  inverse 
transf ormation  for  the  Pareto  distribution  can  easily  be 
expressed  in  closed  form. 


The  inverse  transform  technique  can  be  accomplished  by 


the  following  four — step  procedure  (5:294-295): 

Step  1.  Compute  the  cumulative  distribution 
function  (CDF)  of  the  desired  random  variable  X.  In  this 
case,  the  CDF  is  the  three-parameter  Pareto  CDF,  given  by 
equation  (15)  and  repeated  here  for  convenience: 

F  (x )  ■  1  -  Cl  +  (x-a)/bl"c  for  x  >  a;  b,c  >  O 

Step  2.  Set  F(X)  =  R  on  the  range  of  X,  where 
X  represents  a  random  Pareto  variable.  This  then  becomes: 

1  -  Cl  +  (X-a)/b3_c  =  R  for  x  >  a  (45) 

Since  X  is  a  random  variable  (with  the  Pareto  distribution  in 
this  case),  then  R  is  also  a  random  variable.  In  fact,  R  has 
a  uniform  distribution  over  the  interval  (0,1)  (5:295). 

Step  3.  Solve  F(X)  in  terms  of  R  to  find  X  = 

F_1 (R) .  In  this  case  the  inverse  is  found  by  solving 
equation  (45): 

1  -  Cl  +  (X-a)/b3“c  =  R 

Cl  +  (X-a)/bl"c  =  1  -  R 

Cb/b  +  (X-a)/b3“c  =  1  -  R 

(b  +  X  -  a)/b  =  (1  -  R)-1/c 
b  +  X  -  a  =  b ( 1  -  R)-1/c 

Therefore  X  =  (a  —  b)  +  b(l  —  R>— =  F— * (R)  (46) 


Equation  (46)  is  called  a  "random  variate  generator”  (5:295) 
-for  the  Pareto  distribution.  As  explained  in  the  discussion 
following  equation  (40),  a  specific,  fixed  pair  of  location 
and  scale  values  can  be  used  to  generate  the  required 
dsviates  without  loss  of  generality.  For  this  thesis,  the 
Pareto  deviates  were  generated  using  location  and  scale  para¬ 
meters  of  1.  Substituting  a=b=l  into  equation  (46)  gives: 


i 

'„-w  “v  JV 


X  =  a  -  b  +  b  (1  -  R) 


=  1— 1+1(1—  R) 


-1/c 


-1/c 


=  (1  -  R) 


-1/c 


Since  R  is  uniformly  distributed  from  0  to  1,  then  so  is  1-R; 
thus  R  can  replace  1-R  in  equation  (47)  to  yield  the 
particular  random  variate  generator  used  to  produce  the 
random  Pareto  variates  for  this  thesis: 


X  =  R  “1/c  =  (l/R)1/c 


Step  4.  Generate  n  uniform  random  numbers 


t  4 


Rl’R2’*">Rn  anc^  compute  the  n  random  Pareto  deviates  from 
equation  (48).  The  random  numbers  used  for  this  thesis  were 


generated  on  the  AFIT  VAX/VMS  computer  system  using  the  IMSL 
subroutine  6GUBS.  Like  most  random  number  generators 


(5:293),  GGUBS  is  designed  to  generate  random  numbers  which 


are  uniformly  distributed  on  the  interval  (0,1).  Therefore, 
the  inverse  transform  technique  was  applied  to  these  random 
numbers  to  generate  random  Pareto  deviates. 

In  step  3  of  the  inverse  transform  procedure,  the 
choice  of  the  location  and  scale  values  is  arbitrary,  and  1 
was  used  here  for  convenience.  It  should  be  noted,  however, 
that  the  deviates  can  be  easily  transformed  into  deviates 
from  a  different  Pareto  distribution  (i.e. ,  one  having  the 
same  shape  c  but  different  location  a’  or  scale  b’).  The 
transformation  stems  from  the  fact  that  all  variates  having 
the  same  shape  can  be  expressed  in  terms  of  the  variate 
having  location  O  and  scale  1,  as  follows  (26:21-22): 

xa,b  *  b  x0,  1  +  a  (49> 

where  Xa^b  denotes  a  Pareto  variate  with  location  a  and  scale 
b  and  XQ^ j  is  a  Pareto  variate  with  location  0  and  scale  1. 
The  transformation  to  the  different  variate  is  then  found  by 
expressing  the  given  variate  in  terms  of  the  0,1  variate, 
si  nee: 

Xa,b  *  b  x0, 1  +  a  implies  XQ# j  =  (Xa>b  -  a)/b 
Thus  Xa,b,  *  b’  XQ> j  +  a’  =  b’C(Xa<b  -  a)/bl  +  a’  (50) 
Therefore,  given  a  variate  having  a  specific  pair  of 


4-8 


values  for  location  and  scale,  equation  (50)  can  be  used  to 
transform  the  variate  to  one  having  a  different  pair  of 
location  and  scale  parameters.  For  example,  the  transfor — 
mation  from  a  variate  having  location  and  scale  a=b=l  to  one 
having  location  a’ -2  and  scale  b’=3  is  given  by: 


.'.•.vV. 


=  3Xn  ,  +  2  *  3C  (X«  ,  -  1) /II  +2  =  3X,  ,  -  1 


i  J 


The  random  Pareto  deviates  generated  by  the  inverse 
transform  technique  were  used  ultimately  to  compute  values  of 
the  modified  K-S,  A— D,  and  C-VM  test  statistics.  However, 


these  test  statistics  can  only  be  useful  if  their  distribu¬ 


tion  functions  are  at  least  partially  known  (13:31).  Thus, 
many  test  statistics  were  computed  to  determine  the  empirical 
distribution.  Critical  values  were  then  identified  using  a 
plotting  positions  technique.  Before  examining  the  plotting 
positions  technique,  it  may  be  helpful  to  understand  how 
critical  values  are  chosen. 


Identifying  Critical  Values.  The  use  of  random 
deviates  to  generate  critical  value  tables  is  based  on  the 
concept  of  hypothesis  testing  mentioned  in  Chapter  II.  Each 
group  of  n  Pareto  deviates  represents  a  simulated  sample  from 
a  parameter-specif ied  Pareto  distribution.  This  makes  the 
null  hypothesis  "Hf,:  H(x)  =  the  Pareto  CDF”  true  for  each 


U,I 


iUj 


sample  of  n  random  Pareto  deviates.  For  each  of  the  three 


tests  <K— S,  A— D,  and  C— VM) ,  equations  <41 )  —  (44)  were  used 
to  compute  5000  independent  values  o-f  the  test  statistic 
under  the  condition  that  Hq  is  true  (13:361).  These  5000 
values  were  then  arranged  in  ascending  order  to  form  sets  of 
5000  order  statistics.  To  determine  critical  values  from 
these  5000  statistics  <15000  total  for  all  three  tests),  it 


i ,  ,< 


is  necessary  to  identify  somehow  the  “critical  region",  i.e. , 
the  set  of  all  values  of  the  test  statistic  that  would  result 


l.  J 


in  the  erroneous  decision  to  reject  the  true  null  hypothesis 
(13:78).  Once  the  critical  region  is  identified,  then  the 
critical  values  can  be  selected  according  to  a  desired  “level 


of  significance",  or  01  ,  which  is  the  maximum  probability  of 
rejecting  a  true  null  hypothesis.  Since  the  use  of  random 
Pareto  deviates  to  compute  the  test  statistics  ensures  that 
H0  is  true,  a  can  be  found  by  determining  the  probability 
that  the  test  statistic  will  assume  a  value  that  falls  within 
the  critical  region  (13:78). 

Since  Hq  is  true  and  a  is  the  maximum  probability  of 
rejecting  Hq,  then  the  minimum  probability  of  correctly 
accepting  Hq  is  1-a.  This  value  of  1- a  represents  a 
certain  percentile  of  the  5000  ordered  test  statistic  values. 
For  example,  the  99th  percentile  is  some  number  that  the  test 
statistic  will  exceed  with  probability  .01  or  less  and  will 
be  less  than  with  probability  .99  or  less  (13:29).  It  is 
this  percentile  relationship  that  is  used  to  select  critical 
values  from  the  5000  test  statistics. 


4-10 


,  -v-V. 

L-J 


’fj- 

tua 


LJ! 


••VlKv 


One  passible  method  of  using  the  percentiles  to  deter — 
mine  critical  values  is  to  simply  select  the  test  statistic 
value  corresponding  to  the  desired  percentile  level  and  make 
that  the  critical  value.  For  example,  under  this  method,  out 
of  a  set  of  5000  ordered  test  statistic  values,  the  critical 
value  for  the  90th  percentile  would  simply  be  the  4500th 
value  (52:6).  This  method  has  some  disadvantages,  however, 
especially  when  the  test  statistics,  which  represent  a 
discrete  distribution,  are  used  to  determine  critical  values 
for  a  continuous  distribution.  More  recently,  the  plotting 
position  technique  has  become  popular  as  a  more  accurate 
method  of  selecting  critical  values  for  continuous 
distributions  (43:7). 


The  Plotting  Positions  Technique.  The  plotting  posi¬ 
tions  technique  is  one  popular  method  of  determining  percen¬ 
tiles  of  the  distribution  underlying  a  ss  of  n  ordered 
sample  values  <24:1619;  25:317).  The  technique  involves 
using  a  large  number  of  discrete  values  of  the  ordered  test 
statistics  and  locating  them  on  a  continuous  spectrum  by 
representing  the  spaces  between  them  as  piecewise  linear 
functions.  This  makes  it  possible  to  linearly  interpolate 


V-V-'. 
r  i 


£  ! 


t  *  * 


tjj 


the  desired  percentiles  between  discrete  values  of  the  test 


f  A 


statistics,  thus  obtaining  more  accurate  critical  values 
(43:7;  52:6). 

Each  ordered  value  may  be  assigned  a  plotting  position 


4-11 


which  is  its  cumulative  probability,  thus  allowing  each  order 
statistic  to  be  mapped  onto  a  probability  scale  -from  0  to  1. 
As  seen  -from  equation  (2),  the  distribution  function  of  these 
n  observations  is  a  step  function  which  jumps  from  (i-l)/n  to 
i/n  at  the  ith  order  statistic  of  the  sample.  However,  if 
the  plotting  position  i/n  is  used,  the  largest  value  cannot 
be  plotted,  while  if  (i— l)/n  is  used,  the  smallest  value 
cannot  be  plotted  (24:1615).  Therefore,  numerous  alternative 
plotting  conventions  have  been  proposed,  most  of  which  have 
been  summarized  by  Harter  (24),  who  presents  various 
arguments  for  and  against  each.  Harter  also  conducted  a 
Monte  Carlo  analysis  of  plotting  positions  for  several 
distributions  and  concluded  that  ”...  the  optimum  choice  of 
plotting  positions  depends  not  only  on  the  purpose  of  the 
investigation,  but  also  (definitely)  on  the  distribution  of 
the  variable  under  consideration"  (25:342). 

While  Harter  made  no  specific  recommendation  for  the 
Pareto,  he  did  observe  that,  "As  samples  increase  above  a 
sample  size  of  20,  the  differences  among  the  positions 
determined  by  any  method  of  estimation  decrease  to  the  point 
where  they  are  practically  unimportant”  (24:1621).  He  also 
noted  that  "in  practice,  plotting  positions  differ  little 
compared  with  the  randomness  of  the  data"  (24:1622).  Since 
this  thesis  employed  5000  independent  values  of  each  test 
statistic,  well  in  excess  of  the  20  cited  by  Harter,  use  of  a 
single  plotting  convention  seems  justified. 


i.  'Jj 


U 

•i  V 

•y-v 

•  v'-  v" 

-V 

;-v-\ 


r  ' 


4-12 


The  platting  convention  selected  -For  this  thesis  is  the 
median  rank,  which  is  closely  approximated  by  the  plotting 
position  (24:1617): 

Yt  *  (i— 0. 3) / (n+0. 4)  (51) 

where  i  »  l,***,n  and  for  this  thesis,  n=5000 .  Thus  each  Yj 
value  lies  in  the  interval  (0,1).  The  median  ranks  position 
yields  median  unbiased  estimates  of  for  a  specified  F(x^) 
and  of  F(x^)  for  a  specified  x^  (24:1625).  Also,  in  highly 
skewed  distributions,  the  median  ranks  position  tends  to  be 
more  accurate  than  other  conventions  (31:300).  Another 
advantage  is  that  values  of  the  median  ranks  have  been 
tabulated  for  sample  sizes  of  1  to  50,  i.e. ,  n  =  1(1)50 
(31:486-489). 

A  detailed  illustration  showing  how  to  use  plotting 
positions  to  determine  critical  values  was  presented  by  Ream 
(43:11-23),  and  will  only  be  summarized  here.  In  graphical 
terms,  the  technique  effectively  plots  the  5000  ordered  test 
statistic  values  X ( j , X » " " ' > X (50OO)  al°n9  the  abscissa 
(horizontal)  axis  and  the  5000  plotting  position  values 
Yi , Y2, . • • 5 Y5000  computed  from  equation  (51)  along  the 
ordinate  vertical)  axis.  These  values  are  assigned  to 
positions  2  to  5001  on  their  respective  axes.  On  the 
vertical  axis,  the  interval  CO, ID  is  completed  by  entering 
the  endpoints  Yq  =  O  at  the  1st  position  and  Y5qqj  =  1  at  the 


5002nd  position.  The  corresponding  endpoints  on  the 
horizontal  axis  are  found  by  linear  extrapolation.  Thus,  in 
using  the  computer  to  program  this  technique,  the  arrays 
correspondi ng  to  the  horizontal  and  vertical  axes  are  each 
composed  of  5002  entries,  i.e.,  the  original  5000  values  and 
two  extrapolated  endpoints. 

To  map  the  collection  of  5000  discrete  values  onto  a 
fully  continuous  line  between  0  and  1  requires  extrapolation 
of  the  endpoints  of  the  plotting  axes.  The  first  point  on 
the  horizontal  axis,  X^Q),  is  computed  by  linearly  extrapola 
ting  from  the  second  and  third  points  (i.e.,  the  first  and 
second  order  statistics),  subject  to  a  non-negativity 
restriction.  Extrapolation  is  performed  by  using  the 
standard  linear  slope-intercept  formula  Y  =  mX  +  b  to 
compute  the  endpoints  X and  X(5q01).  To  find  the  first 
endpoint  on  the  horizontal  axis,  the  slope  is  calculated  by: 

m 

and  the  intercept  is: 

b  =  Yj  -  m  X(1)  (53 

Then  the  lower  endpoint  x(0>  is  found  by: 


b)  /m 


(O— b> /* 


—b/m 


*(0>  =  <Yo  - 

The  nonnegati vi ty  restriction  Mans  that  whenever  -  b/m  <  0, 
then  X(q)  is  simply  set  to  0.  Thus: 

X(Q)  «  max  <0, -b/m)  (54) 

The  higher  endpoint  X{soq\)  is  -found  in  the  same  way  as 
the  lower  endpoint.  The  slope  is 

YSOOO  ~  y4999 

m  =  -  (55) 

* (5000)  “  * (4999) 


and  the  intercept  is: 


b  ®  Y 4999  “  »  X( 4999 )  ( 56 1 


Then  the  second  endpoint  X^qqd  is  extrapolated  by: 


X  (5001 )  *  <Y5001  "  b,/m  *  d-b)  /m  (57) 


Once  the  endpoints  are  added  to  the  abscissa  and 
ordinate  axes,  the  5002  discrete  points  on  the  graph  are 
"connected"  by  straight  lines,  thus  producing  a  completely 
continuous,  piecewise  linear  function.  The  range  of  this 
continuous  function  is  the  interval  CO, ID  and  contains  the 
5000  median  rank  values  as  well  as  the  endpoints  0  and  1. 


Its  domain  contains  the  set  of  5000  test  statistic  values  and 


their  2  extrapolated  endpoints. 

As  shown  in  Figure  5,  the  desired  critical  value  for  a 
given  percentile  is  found  by  linearly  interpolating  between 
two  of  the  5002  points  used  to  construct  the  now  continuous 
graph.  For  example,  to  find  the  95th  percentile  (  ce  *  .05), 
the  largest  plotting  position  Y^  is  found  such  that  Y ^  <  .95; 
thus  Yj+1  is  the  first  position  greater  than  .95.  Then  the 
critical  value  corresponding  to  the  95th  percentile  is  found 
by  linearly  interpolating  between  the  points  Y^)  and 

<xCj+l)»  Yj+1*  using  the  formulas: 


where  Cp  is  the  critical  value  for  the  the  lOOpth  percentile. 
For  this  thesis,  critical  values  were  calculated  for  p  =  .80, 
.85,  .90,  .95,  and  .99,  corresponding  to  the  levels  of 
significance  a  =  .20,  .15,  .10,  .05,  and  .01. 

The  specific  plotting  position  procedure  performed  for 
this  thesis  is  described  in  step  7  of  the  next  section. 


PLOTTING 

POSITIONS 


Specif ic  Procedures 

By  applying  the  basic  principles  and  techniques 
described  in  the  previous  section,  the  K-S,  A-D,  and  C-VM 
tests  were  modified  to  produce  new  goodness-of — f i t  tests  for 
the  Pareto  distribution. 

The  research  effort  was  performed  in  three  stages,  each 
corresponding  to  one  of  the  three  research  objectives  listed 
in  Chapter  I.  The  first  stage  consisted  of  a  nine-step  Monte 
Carlo  simulation  procedure  to  produce  critical  value  tables 
for  the  modified  K-S,  A— D,  and  C-VM  tests.  The  second  stage 
of  the  research  compared  the  powers  of  the  three  modified 
tests  using  eight  alternative  distributions.  Finally,  a 
regression  analysis  mas  performed  to  determine  the  functional 
relationship  between  the  critical  values  and  the  shape 
parameters.  Computer  programs  were  written  to  accomplish  the 
first  two  stages.  The  third  stage  was  performed  manually  by 
using  a  hand  calculator  to  compute  linear  relationships  by 
the  method  of  least  squares. 

Stage  Is  Generating  Critical  Value  Tables.  During  the 
first  stage,  critical  value  tables  were  generated  using  Monte 
Carlo  simulation.  A  FORTRAN  computer  program  was  written  for 
this  purpose  and  is  contained  in  Appendix  A.  The  accom¬ 
panying  flow  chart  illustrates  the  logic  flow  of  the  program. 
The  following  nine  steps  outline  the  procedure  used: 


Step  1  -  Generate  the  Data.  Random  deviates  for 
a  given  sample  size  n  were  generated  from  a  specified  Pareto 
distribution  by  using  the  IMSL  routine  GGUBS  to  generate  n 
random  numbers,  and  then  applying  the  inverse  transform 
technique  (equation  48). 

Step  2  —  Order  the  Data.  Next,  the  n  random 
deviates  xj,X2,'*"|Xn  were  converted  to  order  statistics 
x ( 1 ) * x (2) * ‘ *  * » x (n)  bY  arranging  them  in  ascending  order  using 
the  IMSL  subroutine  VSRTA. 

Step  3  -  Estimate  the  Parameters.  The  ordered 
Pareto  deviates  were  then  used  to  find  the  best  linear 
unbiased  estimates  of  the  scale  and  location  parameters  as 
explained  in  the  "Summary  of  BLUEs"  section  of  Chapter  III. 

Step  4  -  Compute  the  Hypothesized  CDF.  The 
estimated  parameters  found  in  step  3  were  used  with  the  n 
ordered  Pareto  deviates  from  step  2  to  calculate  the 
hypothesized  cumulative  distribution  function  (CDF)  Pj  for 
i=l,2,***,n  (equation  40  in  chapter  III). 

Step  5  -  Calculate  the  Test  Statistics.  Based  on 
the  hypothesized  CDF  and  the  BLUEs,  the  modified  K-S,  A-D, 
and  C-VM  statistics  were  next  calculated  using  equations 
(42),  (43),  and  (44). 

Step  6  -  Generate  5000  Statistics.  Each  of  these 
five  steps  were  repeated  5000  times  to  generate  5000 
independent  K-S,  A-D,  and  C-VM  statistical  values 

X1  *  x2’ *  * ’ * x5000 ■ 


Step  7  -  Find  the  Critical  Values.  For  each  of 
the  three  tests,  the  5000  statistics  were  ordered  as  in  step 
2.  Using  the  median  ranks  plotting  position  technique 
(equation  51),  the  80th,  85th,  90th,  95th,  and  99th 
percentiles  o-f  the  distributions  o-f  each  test  statistic  were 
calculated  by  linear  interpolation.  These  percentiles 
correspond,  respectively,  to  the  .20,  .15,  .10,  .05,  and  .01 
levels  o-f  significance  and  served  as  the  critical  values  for 
the  modified  K-S,  A-D,  and  C-VM  goodness-of-f it  tests.  The 
specific  step-by-step  process  was  to: 

a.  Use  the  1MSL  subroutine  VSRTA  to  order  the 
5000  test  statistics,  thus  forming  the  5000  order  statistics 

X(1>’X(2)»*'*»X<5000>- 

b.  Use  equation  (51)  to  compute  the  5000 

platting  positions  Yl» Y2»  *  * ’ * Y5000*  Also,  set  Yq  =  0  and 
Y5001  *  1* 

c.  Use  equations  (52),  (53),  and  (54)  to  find 

X(Q).  Similarly,  use  equations  (55),  (56),  and  (57)  to  find 

X (5001) * 

d.  For  a  given  p,  find  the  largest  such  that 

Yj  <.  p;  then  use  equations  (58),  (59),  and  (60)  to  find  the 

critical  value  Cp  representing  the  100(p)th  percentile. 
Repeat  this  step  for  p  =  .80,  .85,  .90,  .95,  and  .99. 

Step  8  -  Repeat  for  Sample  Sizes.  To  evaluate 
the  effect  of  sample  size  on  the  critical  values,  steps  1 
through  7  were  repeated  for  each  sample  size  n.  This  thesis 


-followed  the  common  practice  (9:15)  of  using  sample  sizes  o-f 
n  equal  to  5,  10,  15,  20,  25,  and  30. 

Step  9  -  Repeat  -for  Shape  Parameters.  Steps  1 
through  8  were  repeated  -for  specified  shape  parameters  0.5, 
1.0,  1.5,  2.0,  2.5,  3.0,  3.5,  and  4.0.  The  critical  values 
were  then  arranged  into  tabular  form  and  appear  in  Chapter  V, 
Tables  VI  -  VIII. 

Stage  2:  Comparing  Power.  The  second  stage  of  the 
research  compared  the  powers  of  the  modified  K-S,  A-D,  and 
C-VM  tests  against  the  Chi-square  to  determine  which  test  can 
best  detect  a  false  Pareto  distribution  hypothesis.  As 
explained  in  Chapter  II,  the  power  of  a  statistical  test  is 
the  probability  of  correctly  rejecting  a  false  null 
hypothesis.  The  null  hypothesis  that  a  set  of  sample 
deviates  follows  a  Pareto  distribution  with  a  specified  shape 
parameter  was  tested  against  the  alternative  hypothesis  that 
the  sample  deviates  follow  some  other  distribution: 

Hq:  Sample  deviates  follow  a  Pareto  CDF  with  shape  c 

HjS  They  follow  some  other  distribution 

For  this  thesis,  the  power  study  was  conducted  for  both  c  =  1 
and  c  *  3.5  in  the  null  hypothesis. 

The  Chi-square  portion  of  the  study  was  performed  as 
described  by  Banks  and  Carson  (5:352-356)  using  five 


equi probable  (ie,  p  -  .20)  class  intervals  (or  cells)  with 
expected  -frequencies  o-f  3  observations  per  cell  -for  n  *  15 
and  S  per  cell  for  n  =  25.  The  endpoints  of  each  cell  were 
computed  from  the  Pareto  CDF  (equation  15)  as  follows: 

F(ei)  *  1  -  Cl  +  (e*  -  a)/bl“c  (61) 

where  e^,  e^,  e4  represent  the  right  endpoints  (maximum 

value)  of  the  first  four  cells.  Since  F(ei)  is  the 
cumulative  area  from  0  to  e^,  then  F(e^)  =  ip  =  .  2i,  so 
equation  (61)  leads  to: 


.2i  ■  1  -  Cl  +  (e4  -  a)/bl~c 
Cl  +  (e4  -  al/bl-*1  =  1  — .2i 
1  +  (e*  -  a)/b  ■  (1  -  .2i)“1/c 
b  +  ej  -  a  *  b(l  -  .2i)~1/c 
ej  =  a  -  b  +  b(l  -  .2i)~1/c 


After  substituting  the  BLUEs  for  location  and  scale  into  this 
last  expression,  the  right  endpoints  were  found  by: 


A  A  A 

ej=a-b+b(l 


.2i)-l/c 


(62) 


Assuming  a  true  Pareto  null  hypothesis,  the  four  endpoints 
. e4  essentially  divide  the  real  line  into  five 


equiprobable  class  intervals.  Given  a  random  sample,  the 


number  o-f  observations  occuring  within  each  cell  were 
counted.  The  Chi-square  test  statistic  was  then  computed  by 
(5:350) : 

5 

x2  s  2:  [(Oi  -  E)  23/E  (63) 

i  =  1 

where  0^  is  the  number  of  observations  occuring  in  cell  i  and 
E  =  n/5  is  the  expected  frequency  in  each  interval.  The 
distribution  of  this  test  statistic  approximately  follows  a 
chi-square  CDF  with  s-l-k  degrees  of  freedom  (13:194)  where  s 
is  the  number  of  cells  (i.e.,  s  =  5)  and  k  is  the  number  of 
parameters  estimated  from  the  sample  (i.e.,  k  =  2) . 

Using  the  IMSL  subroutines  GGWIB,  GGAMR,  GGBTR,  GGEXN, 
and  GGNML,  random  deviates  from  different  distributions  of 
sample  size  n  were  generated.  The  alternate  distributions 
used  were,  respectively,  the  Weibull  at  shape  parameter  3.5, 
the  Gamma  at  shape  parameter  2.0,  the  Beta  at  parameters  P  - 
2  and  Q  -  3,  the  exponential  with  mean  =  2,  and  the  normal 
distribution.  Also  tested  were  three  sets  of  Pareto  deviates 
generated  by  a  FORTRAN  subroutine.  The  first  Pareto  deviate 
set  was  generated  using  a  =  b  =  c  =  1.0;  the  second  set  used 
a  =  2,  b  »  3,  and  c  *  3.5;  the  third  used  a  =  10,  b  =  5,  and 
c  =  2.0.  Five  thousand  random  samples  of  size  n  were 
generated  for  each  of  the  alternate  distributions. 

The  K-S,  A-D,  C-VM,  and  Chi-square  test  statistics  were 
then  calculated  under  the  null  hypothesis  that  the  random 


deviates  -follow  the  Pareto  distribution  with  specified  shape 
c  =  1.0  or  3.5.  To  determine  whether  to  reject  the  null 
hypothesis,  the  calculated  K-S,  A-D,  and  C-VM  statistics  were 
compared  to  the  corresponding  critical  value  obtained  in 
stage  one.  The  computed  Chi-square  test  statistic  was 
compared  against  two  sets  of  critical  values.  The  first  set 
was  taken  from  a  standard  table  of  Chi-square  critical  values 
(13:432)  based  on  2  degrees  of  freedom.  The  second  set  of 
critical  values  was  generated  by  using  equations  (62)  and 
(63)  and  applying  the  9-step,  5000-repetition  Monte  Carlo 
procedure  described  in  the  previous  section. 

This  procedure  of  comparing  test  statistics  against 
critical  values  was  repeated  5000  times  for  each  distribution 
and  test.  The  number  of  times  each  statistic  exceeded  the 
respective  critical  value  was  counted  for  each  sample  size. 
This  total,  representing  the  number  of  rejections  of  the  null 
hypothesis,  was  divided  by  the  total  number  of  tests 
performed  (5000),  to  yield  an  hypothesis  rejection  quotient. 
For  a  random  sample  generated  from  the  hypothesized  Pareto 
distribution,  the  quotient  represents  the  rate  of  erroneous 
rejection  of  a  true  null  hypothesis;  thus,  it  is  expected  to 
be  approximately  the  level  of  significance  a ,  which  is  the 
probability  of  committing  a  Type  I  error  (13:78).  In  those 
cases  involving  random  samples  generated  from  an  alternative 
distribution,  the  quotient  represents  the  power  of  the  test, 
since  it  approximates  the  probability  of  correctly  rejecting 


a  false  null  hypothesis  (13:79). 

A  FORTRAN  program,  written  to  compute  the  hypothesis 
rejection  rates  and  accomplish  the  power  study,  is  contained 
in  Appendix  B.  Figure  7  in  Appendix  B  shows  how  the  program 
used  the  following  9-step  process: 


V.'.V 


I  3 


Step  1.  Use  IMSL  or  inverse  transform  to 
generate  n  random  deviates  from  a  selected  distribution. 

Step  2.  Assume  the  null  hypothesis  that  this  set 
of  n  deviates  follows  the  Pareto  of  given  shape  c  =  1.0. 

Then  perform  steps  2-5  of  the  previous  section  to  compute  the 
values  of  the  Chi-square  (eqn  63)  and  modified  K-S,  A-D,  and 
C-VM  test  statistics  (eqns  42-44). 

Step  3.  For  a  given  level  of  significance  a, 
compare  the  test  statistic  value  against  the  appropriate 
critical  value  found  in  the  previous  section.  If  the  test 
statistic  value  equals  or  exceeds  the  critical  value,  Hq  is 
rejected. 

Step  4.  Repeat  steps  1-3  5000  times,  each  time 
using  a  different  seed  to  generate  the  deviates. 

Step  5.  Count  the  number  of  times  Hq  was 
rejected  and  divide  by  5000  to  obtain  the  power. 

Step  6.  Repeat  steps  1-5  for  each  alternative 
distribution  considered. 

Step  7.  Repeat  steps  1-6  for  sample  sizes  n  =  5, 

15,  and  25. 


L 


4-25 


Step  8.  Repeat  steps  1-7  for  a  =  .05  and  .01 
Step  9.  Repeat  steps  1-8  using  hypothesized 


Pareto  shape  c  =  3.5.  The  power  values  were  then  arranged 
into  tabular  form  and  appear  in  Chapter  V,  Tables  IX  and  X 

Stage  3:  Determining  Functional  Relationship.  The 
third  and  final  stage  of  the  research  was  to  determine  what 
(if  any)  functional  relationship  exists  between  the  shape 
parameter  and  the  critical  values  generated.  This  relation¬ 
ship  can  then  be  used  to  interpolate  critical  values 
corresponding  to  parameters  not  found  in  the  generated 
tables. 

To  accomplish  this  stage,  shape  parameters  and  critical 
values  were  examined  for  linear  relationships.  In  an  attempt 
to  "fit"  the  data  to  a  line,  a  linear  regression  was 
performed  using  the  method  of  least  squares  (13:263-271), 
which  minimizes  the  sum  of  the  squares  of  the  deviations  of 
the  actual  data  paints  from  the  straight  line  of  “best"  fit 
(5:359-363).  Where  applicable,  the  correlation  coefficient 
(13:250-251)  was  also  found. 

Linear  regression  is  a  cenability  available  on  many 
hand  calculators  currently  on  the  market,  so  it  was 
unnecessary  to  write  a  separate  computer  program  to  perform 
this  function.  For  each  level  of  significance  and  sample 
size,  critical  values  from  Tables  VI  -  VIII  were  paired 
against  a  corresponding  Pareto  shape  parameter.  The 


regression  and  correlation  coefficients  were  then  obtained 
manually  by  using  the  linear  regression  keys  on  a  Texas 
Instruments  TX-55-II  calculator.  The  results  are  contained 
in  Chapter  V,  Tables  XI  and  XII. 

Chapter  Summary 

The  research  for  this  thesis  was  performed  by  applying 
the  Monte  Carlo  method  using  5000  repetitions  to  generate 
critical  value  tables  and  a  power  study. 

In  stage  1,  random  Pareto  deviates  were  generated  by 
using  the  inverse  transform  technique,  and  5000  test 
statistics  were  computed  for  each  test.  The  median  ranks 
platting  positions  technique  was  then  used  to  select  critical 
values  from  the  5000  test  statistics.  In  stage  2,  the  powers 
of  the  modified  K-S,  A-D,  and  C-VM  tests  were  compared 
against  the  power  of  the  Chi-square  test.  The  calculations 
were  performed  by  computer  programs  written  to  accomplish  a 
9-step  Monte  Carlo  procedure.  Stage  3  involved  manual 
calculations  based  on  the  method  of  least  squares  to  find 
linear  relationships  between  shape  parameters  and  critical 
values. 

The  results  of  this  research  are  presented  in  the  next 


chapter 


r.  T.TrvT'v. 


V.  RESULTS  AND  APPLICATION 


f. 

.*  ■  •.■■■ 


Chapter  Overvi ew 

This  chapter  shows  the  results  obtained  -from  carrying 
out  the  methodology  described  in  Chapter  IV.  In  response  to 
the  three  research  objectives  listed  in  Chapter  I,  tables  of 
critical  values  for  the  modified  K-S,  A-D,  and  C-VM  tests  are 
presented.  Also  included  are  tables  comparing  powers  of  the 
K— S,  A— D,  and  C— VM  statistics  against  the  Chi-square.  Tables 
of  regression  coefficients  are  presented  as  well.  The  use  of 
the  tables  is  explained,  and  an  example  is  described. 


Critical  Value  Tables 

Table  VI  contains  critical  values  for  the  modified 
Kolmogorov-Smirnov  Test.  The  modified  Anderson-Darling 
critical  values  appear  in  Table  VII.  In  Table  VIII,  the 
modified  Cramer — von  Mises  critical  values  are  presented. 
Critical  values  are  presented  for  each  level  of  significance 
a  =  .20,  .15,  .10,  .05,  and  .01;  sample  sizes  n  =  5,  10, 

15,  20,  25,  and  30;  and  Pareto  shape  parameters  .5,  1,  1.5, 
2,  2.5,  3,  3.5,  and  4.  It  is  important  to  note  that  for 
shape  c  =  0.5,  the  presented  critical  values  correspond  to 
sample  size  n  =  6  instead  of  n  =  5.  As  explained  in  Chapter 
III,  this  exception  is  necessary  since  the  BLUEs  could  not  be 
computed  for  the  case  where  c  =  .5,  n  =  5. 


-v-y-'v 

P 

m 

js  A 

r  v-j 


l  JS 


■  .vv  //a/.*  •*,  •;  «  * 


5-1 


Table  VI 

CRITICAL  VALUES  FOR  THE  MODIFIED  KOLMOGOROV-SMIRNOV  TEST 


3* 

10 

10  2 
20 

23 

30 


Pareto 

Shaoe 

Parameter 

c 

• 

* 

1.0 

1.3 

2.0 

2.3 

3.0 

3.3 

.313 

.239 

.236 

.283 

.236 

.293 

.217 

.219 

.225 

.228 

.  134 

.  134 

.185 

.187 

.191 

.192 

• 

.  160 

.  160 

.163 

.  167 

.  168 

.170 

• 

.  144 

.  146 

.148 

.  149 

.  153 

.154 

• 

.133 

.135 

.135 

.138 

.139 

.  142 

• 

.294 

.223 

.293 
.  232 

.298 

.236 

.  306 

a  » 

.309 

.242 

.193 

.196 

.199 

.203 

.207 

.172 

.175 

.176 

.178 

.179 

.155 

.  157 

.  160 

.161 

.163 

.142 

.  145 

.146 

.150 

.  14? 

.187 

.  188 

.191 

.171 

.170 

.173 

.155 

.  161 

.159 

NOTEi  For  shaDe  c  =  0.5,  critical  values  correspond  to 
sample  sice  n  =  6  instead  of  n  *  5. 


Table  VIII 


Tables  IX  and  X  display  the  results  o-f  the  power 
analysis.  For  sample  sizes  n  =  5,  15,  and  25,  the  tables 
indicate  relative  power  o-f  the  K-S,  A-D,  and  C-VM  tests  to 
reject  a  null  hypothesis  when  the  hypothesis  claims  that  a 
random  sample  of  data  follows  a  Pareto  distribution.  For 
sample  sizes  n  »  15  and  25,  the  power  of  the  Chi-square  test 
is  also  included.  Table  IX  shows  power  values  when  the  null 
hypothesized  Pareto  CDF  has  shape  parameter  c  =  1.0.  In 
Table  X,  the  hypothesized  shape  parameter  is  c  -  3.5.  Both 
tables  examine  power  performance  against  eight  different 
distributions,  including  three  variations  of  the  Pareto 
distribution  having  different  sets  of  parameters. 

The  power  tables  are  divided  into  two  levels  of 
significance,  a  =  .05  and  .01.  In  Table  IX,  the  first  column 
corresponds  to  a  Pareto  distribution  with  shape  c  =  1.0. 

Thus,  the  values  in  the  first  column  of  Table  IX  approximate 
the  level  of  significance  a,  since  they  represent  rejection 
rates  of  the  null  hypothesis  when  Hq  is  true.  Similarly  in 
Table  X,  the  second  column  represents  a  true  null  hypothesis 
since  the  underlying  data  was  generated  from  a  Pareto 
distribution  with  shape  parameter  c  =  3.5.  Aside  from  these 
two  exceptions,  all  other  col  urns  represent  power  values  since 
they  indicate  rejection  rates  of  the  null  hypothesis  when  Hq 
is  in  fact  false.  A  note  following  the  tables  indicates 
parameters  of  the  alternate  distributions. 


Table  IX 


POWER  TEST  FOR  THE  PARETO  DISTRIBUTION 
fyi  Pareto  Diatribution  at  Shaoa  c  •  1.0 
Hj!  The  data  follow  another  distribution 

Level  of  Significance  =  .05 


S3SSS3S3S3 

n  Test 

=SS3==3S 

Par.  1 

Par.  2 

Alternate  Distributions* 

Par. 3  Weibl  Gamma  Beta 

Expon 

Norml 

m 

0.046 

0.061 

0.050 

0.288 

0.123 

0.227 

0.074 

0.311 

0.048 

0.014 

0.022 

0.007 

0.006 

0.008 

0.009 

0.007 

CVM 

0.050 

0.063 

0.051 

0.283 

0.127 

0.224 

0.076 

0.307 

K-S 

0.048 

0.145 

0. 107 

0.979 

0.657 

0.933 

0.290 

0.979 

15  A‘D 

0.052 

0.126 

0.083 

0.966 

0.644 

0.398 

0.266 

0.965 

CVM 

0.052 

0.173 

0.121 

0.974 

0.697 

0.915 

0.329 

0.973 

*2 

0.043 

0.118 

0.086 

0.860 

0.480 

0.738 

0.235 

0.878 

K-S 

0.052 

0.248 

0. 138 

0.927 

1.000 

0.503 

1.000 

25  A-D 

0.049 

0.250 

0.128 

1.000 

0.937 

0.998 

0.528 

1.000 

CVM 

0.050 

0.256 

0.143 

0.999 

0.926 

0.996 

0.504 

1.000 

X2 

0.045 

0.178 

0.105 

0.999 

0.823 

0.996 

0.377 

0.999 

Level  of  Significance  =  .01 

K-8 

0.010 

0.021 

0.021 

0.171 

0.067 

0.115 

0.034 

0.172 

5  A-D 

0.009 

0.002 

0.004 

0.000 

0.000 

0.000 

0.000 

0.001 

CVM 

0.010 

0.019 

0.019 

0.155 

0.059 

0.098 

0.030 

0. 160 

K-S 

0.015 

0.059 

0.035 

0.941 

0.448 

0.852 

0. 150 

0.937 

15  A_D 

0.011 

0.038 

0.021 

0.875 

0.356 

0.716 

0. 103 

0.378 

CVM 

0.016 

0.062 

0.034 

0.906 

0.439 

0.777 

0.139 

0.910 

X* 

0.006 

0.031 

0.016 

0.645 

0. 172 

0.400 

0.064 

0.669 

K-8 

0.010 

0.086 

0.039 

0.999 

0.774 

0.992 

0.250 

0.998 

25  A“® 

0.009 

0.080 

0.032 

0.997 

0.778 

0.982 

0.247 

0.997 

CVM 

0.010 

0. 100 

0.046 

0.997 

0.792 

0.982 

0.274 

0.998 

*2 

0.011 

0.061 

0.033 

0.964 

0.594 

0.884 

0. 172 

0.971 

*  Kev 

to  Alternate  Distributions: 

Par . 1  - 

Pareto 

<a=l,  b=l , 

c=l) 

Gamma  - 

Gamma  (shape  =  2) 

Par. 2  - 

Pareto 

(a=2,  b=3 

c=3.5) 

Beta  - 

Beta  <P= 

=2,  0=3) 

Par. 3  - 

Pareto 

<a=l0,  b=5,  c=2) 

Expon  - 

Exponential  (mean  =  2) 

Weibl  - 

Weibul 1 

(shape  = 

*7  e;\ 

Norml  - 

Normal  distribution 

5-6 


Table  X 

POWER  TEST  FOR  THE  PARETO  DISTRIBUTION 
fyi  Pareto  Distribution  at  8hape  c  ■  3.5 

Hj:  The  data  -follow  another  distribution 

Level  of  Significance  =  .05 


n  Test  Par. 1 


Alternate  Distributions* 

Par. 2  Par. 3  Weibl  Gamma  Beta  Expon 


Norral 


0.278  0.602 

0.169  0.480 


690 

823 

826 


0.717 


Level  of  Significance 


0.003 


0.261 

0.009 

0.0 

38 

0.578 

0.224 

0.011 

0.0 

31 

0.609 

0.078 

0.015 

0.013 

0.576 

0.341 

0.010 

0.0 

40 

0.794 

0.402 

0.008 

0.0 

52 

0.912 

0.377 

0.011 

0.0 

46 

0.922 

0.130 

0.009 

0.0 

11 

0.878 

0.614 


Key  to  Alternate  Distributions; 


Pareto  (a=l ,  b=l ,  c=l ) 
Pareto  (a=2,  b=3,  c=3. 5) 
Pareto  (a=10,  b=5,  c=2) 
Wei  bull  (shape  =  3.5) 


Gamma  -  Gamma  (shape  =  2) 
Beta  -  Beta  <P=2.  Q=3) 
Expon  -  Exponential  (mean  = 
Norm 1  -  Normal  distribution 


AD-A163  837  NOTIFIED  KOLHOQOROV-SHIRNOV  ANDERSON-DARLING  AND 

CRAHER-VON  RISES  TESTS  F.  .  (U)  AIR  FORCE  INST  OF  TECH 
MRIGHT-PATTERSON  RFB  OH  SCHOOL  OF  ENGI. .  J  E  PORTER 
UNCLASSIFIED  DEC  85  AFIT/GSO/HA/85D-8  F/G  12/1 


Li near  Reqressi on  Tables 

Tables  XI  and  XII  indicate  the  linear  relationships 
existing  between  critical  values  and  Pareto  shape  parameters 
Table  XI  pertains  to  Kolmogorov-Smirnov  critical  values, 
while  Table  XII  pertains  to  Cramer-von  Mises  critical  values 
No  consistent  linear  relationship  was  identified  for 
Anderson-Darl ing  critical  values. 

The  two  tables  contain  linear  coefficients  and 
correlation  values  for  each  combination  of  sample  sizes  n  = 
lO,  15,  20,  25,  and  30  and  levels  of  significance  Cc  =  .20, 
.15,  .10,  .05,  and  .01.  No  consistent  linear  relationship 
could  be  found  for  sample  size  n  =  5.  Further,  the  linear 
relationships  apply  only  for  values  of  the  shape  parameter  c 
in  the  range  1.5  <.  c  <  4.0.  Critical  values  for  c  <  1.5 
failed  to  display  any  consistent  linear  trend. 

Each  combination  of  sample  size  and  significance  level 
has  its  own  linear  coefficients  and  correlation  value.  In 
each  case,  the  relationship  between  critical  value  Y  and 
shape  parameter  c  is  given  by  the  simple  linear  regression 
equation  Y  =  bQ  +  bjc  where  tag  corresponds  to  the  Y-axis 
intercept  and  bj  represents  the  slope  of  the  described  line. 
The  correlation  value  R2  indicates  the  percent  of  total 
variation  explained  by  the  regression  line.  Thus,  R2  is  a 


measure  of  the  strength  of  the  linear  relationship,  with 
values  near  1  indicating  a  strong  linear  tendency  (13:250) 


Table  XI 


COEFFICIENTS  AND  R2  VALUES  OF  THE  RELATIONSHIP* 
BETWEEN  KOLMOGORO V-SM I RNOV  CRITICAL  VALUES  AND 


PARETO  SHAPE 

PARAMETERS 

FOR  1.5 

i  c  < 

4.0 

Level  of 

Significance 

n 

Coe-f -f 

.20 

.15 

.10 

.05 

.01 

b0 

.2080 

.2154 

.2222 

.2359 

.2704 

10 

bl 

.0057 

.0067 

.0090 

.0117 

.0144 

R2 

0.998 

0.997 

0.993 

0.997 

0.993 

bO 

.  1752 

.1804 

.  1896 

.2042 

.2339 

IS 

bl 

.0051 

.0065 

.0074 

.0091 

.0117 

R2 

0.977 

0.993 

0.999 

0.990 

0.987 

bO 

.1544 

.  1630 

.  1699 

.1828 

.2102 

20 

bl 

.0044 

.0042 

.0054 

.0068 

.0091 

R2 

0.973 

0.969 

0.964 

0.960 

0.935 

b0 

.  1403 

.  1461 

.  1535 

.1623 

.  1885 

25 

bl 

.0038 

.0043 

.0050 

.0075 

.0091 

R2 

0.980 

0.991 

0.963 

0.994 

0.964 

bO 

.  1302 

.  1362 

.  1418 

.  1542 

.  1728 

30 

bl 

.0030 

.0034 

.0047 

.0053 

.0090 

R2 

0.944 

0.947  0.946 

0.967 

IBCSSS3 

0.979 

*  Relationship  between  K-S  critical  values  Y 
and  Pareto  shape  parameter  c  is  approx i matel y 

Y  *  bQ  +  bj  c  where  1.5  <_  c  <_  4.0 


s*  **  a*  A  .• 


-*«  J- 


.*  *  * 

vy.v 


5-9 


Level 

of  Significance 

n 

Coeff 

.20 

.15 

.  10 

.05 

.01 

b0 

.0741 

.0825 

.0915 

.  1089 

.1556 

10 

bl 

.0045 

.0050 

.0067 

.0095 

.0137 

R2 

0.986 

0.970 

0.973 

0.985 

0.981 

bO 

.0769 

.0832 

.0964 

.1170 

.  1643 

15 

bl 

.0053 

.0069 

.0083 

.0106 

.  017B 

R2 

0.982 

0.996 

0.993 

0.965 

0.980 

bo 

.0805 

.0905 

.  1031 

.  1252 

.  1833 

20 

bl 

.0047 

.0051 

.0065 

.0089 

.0135 

R2 

0.966 

0.957 

0.978 

0.957 

0.974 

b0 

.0806 

.0910 

.1045 

.  1264 

.  1831 

25 

bl 

.0055 

.0059 

.0072 

.0102 

.0166 

R2 

0.979 

0.989 

0.992 

0.978 

0.932 

b0 

.0834 

.0936 

.1116 

.  1372 

.  1907 

30 

bl 

.0047 

.0055 

.0054 

.0074 

.0161 

R2 

0.964 

0.945 

0.899 

BSVSSS3SSS 

0.872 

0.967 

Relationship  between  C-VM  critical  values  Y 
and  Pareto  shape  parameter  c  is  approximately 


♦  bt  c 


ire  1.5  <  c  <  4.0 


Use  o-f  Tables 


This  section  explains  how  to  use  the  research  results 
contained  in  Tables  VI  -  XII. 

Using  Critical  Value  Tables.  The  critical  values 
contained  in  Tables  VI  -  VIII  can  be  used  to  test  whether  a 
random  data  sample  o-f  size  n  *  5,  10,  15,  20,  25,  or  30 
fallows  a  three— parameter  Pareto  distribution  having  speci¬ 
fied  shape  parameter  c  *  .5,  1,  1.5,  2,  2.5,  3,  3.5,  or  4. 
Given  a  random  sample  of  observed  data,  the  following  steps 
outline  basic  elements  of  the  procedure  used  in  testing 
goodness-of -f it  (13: 357-367) : 

Step  1.  Determine  n,  the  number  of  observations 
contained  in  the  random  data  sample. 

Step  2.  Identify  the  null  and  alternative 
hypotheses  to  be  tested.  In  this  case,  the  hypothesized 
shape  parameter  c  must  also  be  specified.  Thus,  the 
hypotheses  are: 

Hq:  The  sample  observations  follow  a  Pareto 
distribution  of  specified  shape  c. 

Hj:  At  least  one  of  the  observations  does  not 
follow  the  Pareto  of  shape  c. 

Step  3.  Determine  the  desired  probability  of 
commiting  a  Type  I  error,  i.e.,  the  probability  of 
erroneously  rejecting  the  null  hypothesis  when  Hq  is  true. 
This  probability  is  the  level  of  significance,  a  (13:78). 


Step  4.  Order  the  n  observations  from  smallest 

to  largest. 

Step  S.  Assume  Hq  is  true  and  estimate  the 
unknown  location  and  scale  parameters  using  an  invariant 
estimator.  If  the  BLUE  is  selected  as  the  estimator,  and  the 
sample  size  is  small,  the  estimates  can  be  computad  manually 
■from  equations  <34)  and  (35)  for  c  *  .5,  1,  or  2;  equations 
(37)  to  (39)  for  c  *  1.5;  or  equations  (17),  (18),  (21), 

(22),  and  (29)  for  c  »  2.5,  3,  3.5,  or  4.  For  larger  sample 
sizes,  or  if  several  samples  are  involved,  use  the  FORTRAN 
subroutines  BXVALS,  BLCLE2,  and  BLCGT2  in  Appendix  A. 

Step  6.  Use  the  estimates  of  location  a  and 
scale  b,  the  hypothesized  shape  c,  and  the  n  ordered  sample 
observations  to  compute  the  hypothesized  Pareto  CDF  from 
equation  (40).  Subroutine  HYPCDF  in  Appendix  A  can  be  used 
if  manual  calculations  are  not  practical. 

Step  7.  Select  the  type  of  test  to  be  performed 
and  compute  the  corresponding  test  statistic.  Use  equation 
(42)  for  the  modified  Kolmogorov-Smirnov  test,  equation  (43) 
for  the  modified  Anderson-Darl ing  test,  or  equation  (44)  for 
the  modified  Cramer — von  Mises  test.  Subroutine  TESTAT  in 
Appendix  A  can  be  used  to  compute  test  statistics  for  all 
three  tests. 

Step  8.  Identify  the  critical  value  from  Table 
VI,  VII,  or  VIII,  based  on  test  type,  level  of  significance, 
sample  size,  and  hypothesized  shape  parameter. 


Step  9.  Reject  the  null  hypothesis  if  the  value 


of  the  test  statistic  exceeds  the  critical  value.  If  the 
test  statistic  does  not  exceed  the  critical  value,  conclude 
that  there  is  insufficient  evidence  to  reject  the  null 
hypothesis  (13:76). 

Using  Power  Comparison  Tables.  Tables  IX  and  X 
can  be  used  to  draw  conclusions  regarding  the  relative 
ability  of  a  test  to  correctly  reject  a  false  null 
hypothesis.  This  information  can  then  be  used  to  select  the 
best  test  for  a  given  situation.  The  higher  the  power  value, 
the  better  are  the  chances  against  commiting  a  Type  II  error 
because  the  probability  of  erroneously  accepting  a  false  null 
hypothesis  is  lessened  (13:78). 

Using  Li near  Regressi on  Tables.  Tables  XI  and  XII 
can  be  used  to  estimate  critical  values  for  shape  parameters 
which  are  not  specifically  listed  in  Tables  VI  and  VIII, 
provided  the  hypothesized  shape  parameter  c  satisfies  1.5  <  c 
<_  4.0.  Given  the  sample  size  and  specified  level  of 
significance,  the  linear  slope  and  intercept  values  contained 
in  Table  XI  can  be  substituted  into  the  regression  equation 
y  »  bQ  +  bj  c  to  find  the  Kolmogorov-Smirnov  critical  value 
y.  If  the  Cramer-von  liises  test  is  involved,  the  values 
should  be  taken  from  Table  XII. 


Suppose  a  maintenance  unit  wants  to  model  the  failure 


rate  of  a  certain  equipment  component.  Based  on  10  indepen¬ 
dent  random  samples,  the  unit  observes  the  following  failure 
times  of  the  component  (expressed  in  months  fallowing  initial 
use) t  1.178,  1.127,  1.373,  1.068,  1.059,  1.010,  1.474, 

4.830,  3.997,  1.799.  The  unit  desires  to  test  the  hypothesis 
that  the  component  failure  times  follow  the  Pareto 
distribution  with  shape  c  *  2.5.  One  specified  requirement 
is  that  the  test  be  designed  so  that  the  probability  of  erro¬ 
neously  rejecting  a  true  null  hypothesis  must  not  exceed  .05. 

Since  there  are  10  random  observations  in  the  data 
sample,  n  =  10  for  this  example.  The  required  level  of 
significance  is  or  *  .05.  The  hypotheses  are: 

Hqz  The  observed  failure  times  follow  the  Pareto 
distribution  of  shape  c  *  2.5. 

H^s  At  least  one  of  the  observations  does  not 
follow  the  Pareto  of  shape  2.5. 

The  next  step  is  to  arrange  the  random  sample  in 
ascending  order:  1.010,  1.059,  1.068,  1.127,  1.178,  1.373, 
1.474,  1.799,  3.997,  4.830.  These  values  are  input  into 
subroutine  BXVALS  which  yields  Bj  values  of  .920,  .838,  .754, 
.668,  .579,  .486,  .389,  .285,  .171,  and  .034.  These  values 
arc  then  input  into  subroutine  BLCGT2,  which  computes  the 

A  A 

parameter  estimates  a  *  .963  and  b  =  1.128.  Subroutine 


HYPCDF  is  than  used  to  compute  10  values  o-f  the  hypothesized 
Pareto  CDFi  .097,  .183,  .201,  .288,  .354,  .339,  .608,  .750, 
.962,  and  .976. 

The  values  of  n,  c,  and  the  hypothesized  Pareto  CDF  are 
input  into  subroutine  TESTAT,  which  computes  the  test 
statistics  K— S  *  .162,  A-D  *  .416,  and  C-VM  »  .042.  From 
Table  VI,  the  K-S  critical  value  -for  a  *  .05,  n  31  10,  and  c  =* 
2.5  is  .265.  Since  the  test  statistic  does  not  exceed  the 
critical  value,  there  is  insufficient  evidence  to  reject  the 
null  hypothesis.  The  same  conclusion  is  reached  from  the  A-D 
and  C-VM  critical  values  (Tables  VII  and  VIII). 

Now  suppose  the  unit  wants  to  test  the  null  hypothesis 
that  a  set  of  n  =»  25  observed  service  times  follows  the 
Pareto  distribution  of  shape  c  =*  3.35.  The  analyst  computes 
the  K-S  or  C-VM  test  statistic  as  before,  but  the  critical 
values  are  not  listed  for  c  =  3.35.  Therefore,  the  next  step 
is  to  determine  the  appropriate  regression  coefficients  from 
Table  XI  or  XII.  For  n  =  23  and  a  *  .05  the  K-S  coeffi¬ 
cients  are  b0  =  .1623  and  bj  =  .0075.  The  K-S  critical  value 
is  Y  *  b0+bt  c  *  .1623  +  .0075  (3.35)  =  .1874. 

Chapter  Summary 

This  chapter  presented  the  results  of  the  research 
conducted  in  response  to  the  three  objectives  listed  in 
Chapter  I.  Tables  of  critical  values  for  the  modified  K-S, 
A-D,  and  C-VM  tests  were  presented.  Also  included  were 


5-15 


tables  comparing  powers  of  thm  K-S,  A-D,  and  C-VM  statistics 
against  thm  Chi-square.  Tables  of  regression  coefficients 
mere  presented  as  well.  The  use  of  the  tables  was  explained, 
and  an  example  was  described. 

The  research  results  are  further  analysed  and  discussed 


in  the  next  chapter 


Chapter  Overvi  ew 


This  chapter  discusses  the  results  presented  in  Chapter 
V.  Observations  are  made  concerning  the  tables  of  critical 
values,  power  comparisons,  and  regression  coe-f-f icients, 
including  an  explanation  as  to  haw  the  computer  programs  were 
verified  and  validated. 

Critical  Val ues 

The  critical  value  tables  generated  for  this  thesis  are 
located  in  Chapter  V.  For  the  K-S  test  (Table  VI),  the  crit¬ 
ical  values  for  a  given  level  of  significance  and  shape  para¬ 
meter  decrease  as  the  sample  size  increases.  Further,  the 
size  of  the  decrease  becomes  smaller  at  larger  values  of  n. 
This  trend  suggests  that  the  K-S  critical  values  may  converge 
to  a  lower  limit  as  the  sample  size  increases.  However,  the 
use  of  sample  sizes  larger  than  30  would  have  required  much 
more  computer  processing  time,  and  thus  was  beyond  the  scope 
of  this  thesis.  The  A— D  critical  values  (Table  VII)  exhibit 
a  different  pattern.  The  values  for  each  combination  of 
significance  level  and  shape  parameter  generally  decrease 
from  n  35  5  to  20  and  increase  from  n  =  20  to  30,  suggesting  a 
convergence  between  15  and  20.  Similarly,  the  C-VM  critical 
values  (Table  VIII)  appear  to  converge  between  n  «  25  and  30, 


since  the  values  consistently  decrease  until  n  3  30,  then 
begin  to  increase. 

An  important  observation  is  made  when  the  table  of 
modified  K-S  values  is  compared  to  a  standard  (unmodified) 

K-S  table  (13:462).  For  each  value  of  n  in  Table  VI,  the 
critical  values  for  shape  1  or  2  at  a  .OS  significance  level 
are  nearly  the  same  as  the  critical  values  for  a  .20  signifi¬ 
cance  level  using  the  standard  table.  Thus  the  result  of 
using  the  standard  K-S  table  when  location  and  scale  para¬ 
meters  are  estimated  would  be  to  obtain  an  extremely  conser¬ 
vative  test  in  the  sense  that  the  actual  significance  level 
would  be  much  lower  than  that  given  by  the  standard  table. 

Power  Comparison 

The  power  comparison  tables  generated  for  this  thesis 
are  located  in  Chapter  V.  Values  in  Table  IX  pertain  to  a 
null  hypothesis  for  which  the  Pareto  shape  parameter  is  1.0, 
whereas  in  Table  X  the  hypothesized  shape  parameter  is  3.5. 
Both  tables  are  divided  into  two  sections  based  on  a  level  of 
significance  of  .05  or  .01.  It  is  obvious  from  the  tables 
that  none  of  the  three  tests  developed  in  this  thesis  is  very 
powerful  when  the  sample  size  is  only  five.  Nevertheless, 
they  at  least  provide  some  means  of  testing  goodness-of-f i t 
for  sample  sizes  which  are  too  small  to  use  the  Chi-square 
test.  For  sample  sizes  of  15  or  25,  the  powers  improve 


dramatical ly 


For  each  alternative  distribution  the  three  tests 


tended  to  be  more  powerful  than  the  Chi-square.  Two  sets  of 
Chi-square  critical  values  were  examined.  The  first  set  of 
values  was  taken  from  a  standard  table  of  Chi-square  critical 
values  corresponding  to  2  degrees  of  freedom  (13:432).  After 
completing  5000  Monte  Carlo  repetitions,  it  was  discovered 
that  the  tabled  Chi-square  value  for  a  level  of  significance 
of  .05  displayed  a  probability  of  a  Type  I  error  (i.e. , 
rejecting  Hq  when  true)  of  .10,  which  was  twice  the  claimed 
level  of  significance  of  .05.  Similarly,  the  probability  of 
Type  I  error  for  a  claimed  level  of  significance  .01  was,  in 

fact,  .02.  This  discrepancy  was  due  to  the  fact  that  the 

tabled  Chi-square  values  represent  only  an  approximation  of 
the  actual  asymptotic  distribution  of  the  Chi-square,  so  that 
the  actual  value  lies  somewhere  between  Chi-square  with  2 
degrees  of  freedom  and  Chi-square  with  4  degrees  of  freedom 
(34:401-402).  Since  the  Type  I  errors  were  twice  their 
expected  value,  a  second  set  of  Chi-square  critical  values 
was  generated  using  Monte  Carlo  simulation  in  the  same  manner 
as  was  used  to  generate  critical  values  for  the  K-S,  A-D,  and 

C-VM  tests.  As  apparent  from  Tables  IX  and  X,  the  second  set 

of  Chi-square  values  display  Type  I  error  rates  which  is  much 
closer  to  the  claimed  level  of  significance  of  .05  or  .01. 
Therefore,  these  values  were  used  in  the  power  comparison 
tables  rather  than  the  less  accurate  values  stemming  from  the 
standard  Chi-square  table. 


6-3 


The  modified  K-S,  A-D,  and  C-VM  tests  are  especially 


powerful  when  the  sample  data  are  taken  from  the  Weibull,  the 
Beta,  or  the  normal  distribution.  On  the  other  hand,  the 
three  tests  display  relatively  low  power  in  their  ability  to 
distinguish  against  the  exponential  or  the  Pareto  with 
differ  exit  shape  parameters.  In  general,  the  K-S  test  has 
higher  power  tham  the  others  when  the  hypothesized  shape 
parameter  is  1.0.  Mhen  the  shape  parameter  is  3.5,  the  C-VM 
test  tends  to  be  more  powerful.  Next  to  the  Chi-square,  the 
A— D  test  appears  to  have  the  lowest  power  in  most  cases. 

Regressi on  Anal ysi s 

The  regression  tables  generated  for  this  thesis  are 
also  located  in  Chapter  V.  Table  XI  contains  regression 
coefficients  and  correlation  values  for  the  modified 
Kolmogorov-Smirnov  test,  while  Table  XII  contains  regression 
values  for  the  Cramer — von  Mises  test. 

It  is  apparent  from  Tables  VI  and  VIII  that  for  a  given 
significance  level  and  sample  size  except  n  =  5,  the  K-S  and 
C-VM  critical  values  decrease  from  shape  parameter  0.5  to 
1.5,  then  steadily  increase  for  shapes  1.5  to  4.0.  Using  the 
method  of  least  squares,  a  simple  linear  regression  analysis 
was  performed  on  the  critical  values.  The  correlation  of 
regression  on  the  shape  parameter  interval  0.5  to  1.5  was  in 
most  cases  less  than  .80.  However,  the  regression 
relationships  on  the  shape  interval  1.5  to  4.0  showed  very 


strong  correlation  (.97  or  higher  in  most  cases).  Therefore, 
regression  coefficients  corresponding  to  the  interval  1.5  <  c 
<  4.0  were  included  in  Tables  XI  and  XII. 

No  consistent  linear  trend  could  be  identified  for  the 
Anderson-Darl ing  critical  values.  In  general  the  values  seem 
to  decrease  on  the  interval  0.5  <.  c  <.  2.5  and  them  increase 
on  2.5  <  c  <  4.0.  However,  when  least  squares  regression 
was  applied  to  the  two  intervals,  the  correlation  values 
tended  to  be  less  than  .80  in  most  cases.  Therefore,  it  was 
decided  not  to  include  a  regression  table  for  the  A— D  test. 

Veri f i cation  and  Validation 

The  critical  values  were  computed  by  the  CRITVAL 
program  and  associated  subroutines  contained  in  Appendix  A. 
The  power  study  was  conducted  using  the  POWER  program  and 
subroutines  in  Appendix  B.  The  purpose  of  verification  was 
to  ensure  that  the  concepts  and  equations  developed  in  this 
thesis  were  reflected  accurately  in  the  computer  code.  The 
five  verification  techniques  suggested  by  Banks  and  Carson 
(5:379)  were  implemented  as  follows: 

1.  Have  the  code  checked.  The  code  was  checked 
by  two  individuals  knowlegeable  of  FORTRAN  programming.  One 
of  the  individuals,  Charek,  was  also  very  familiar  with  the 
logic  required  for  computing  parameter  estimates  for  the 
Pareto  distribution,  since  he  too  has  conducted  extensive 
research  in  this  area  (12). 


2.  Make  a  flow  diagram.  Flow  diagrams 
illustrating  thm  logic  involved  in  generating  critical  values 
served  as  the  basis  of  the  program  and  were  closely  followed 
during  the  actual  writing  of  the  program.  The  diagrams  are 
included  in  Appendices  A  and  B. 

3.  Examine  a  wide  variety  of  output.  The 
output  of  each  subroutine  and  the  results  of  each  individual 
computation  was  checked  through  extensive  use  of  print 
statements.  Each  computational  stage  was  checked  at  least 
once  against  manual  calculations  to  ensure  the  expected 
values  were  produced.  A  pre— production  run  involving  SO 
replications  was  thoroughly  examined  for  reasonableness  prior 
to  the  final  production  run  of  5000  replications. 

4.  Print  the  input  parameters.  During  the  test 
runs,  input  parameters  were  printed  before  and  after  each 
calculation  to  ensure  against  any  inadvertant  alteration  of 
parameters. 

5.  Make  the  code  self -documenting.  Extensive 
comments  have  been  incorporated  into  the  programs  and 
subroutines  to  allow  easy  interpretati on  of  the  logic.  At 
the  beginning  of  each  program  component,  every  variable  is 
defined  and  the  purpose  explained. 

Validation  of  the  computer  programs  was  provided  in  the 
results  of  the  power  study.  For  each  hypothesized  shape 
parameter  and  sample  size,  the  K-S,  A-D,  and  C-VM  tests 


displayed  a  Type  I  error  rate  equal  to  or  very  near  the 


claimed  level  of  significance.  This  fact  validates  the 
critical  values  as  well  as  the  power  comparison  values. 

Chapter  Summary 

The  results  of  this  thesis  are  presented  in  Tables 
VI-XII.  The  results  of  the  power  study  show  that  the  three 
tests  developed  in  this  thesis  offer  tests  which  can  be  used 
with  small  sample  sizes  and  are  more  powerful  than  the 
Chi-square  at  larger  sample  sizes.  The  programs  used  to 
generate  the  tables  were  thoroughly  verified  and  validated. 

Conclusions  and  recommendations  for  further  study  are 
presented  in  the  next  chapter. 


Conclusions 


The  following  conclusions  are  based  on  the  results 
contained  in  this  thesis: 

1.  The  -first  research  objective  listed  in  Chapter  I 
has  been  successfully  -fulfilled.  Tables  VI — 1 VIII  contain 
critical  values  of  the  modified  Kol mogorov-Smi rnov  (K-S) , 
Anderson-Darl ing  (A-D) ,  and  Cramer-von  Mises  (C-Vli)  tests. 

The  validity  of  these  critical  values  has  been  verified  by  a 
Monte  Carlo  power  study  which  has  shown  that  all  three  tests 
achieve  the  claimed  level  of  significance  when  the  null 
hypothesis  is  true.  Therefore,  each  table  of  critical  values 
can  be  used  to  test  whether  a  random  sample  of  data  follows 
the  three-parameter  Pareto  distribution  with  specified  shape 
parameter . 

2.  The  second  research  objective  has  also  been 
completed  successfully.  The  results  of  the  power  study  are 
contained  in  Tables  IX  and  X.  It  appears  that  none  of  the 
three  tests  developed  in  this  thesis  is  very  powerful  when 
the  sample  size  is  only  five.  For  sample  sizes  of  IS  or  25, 
however,  the  powers  improve  dramatically.  For  each  of  the 
alternative  distributions  considered,  the  three  tests  tended 
to  be  more  powerful  than  the  Chi-square,  as  expected.  The 
three  tests  are  especially  powerful  when  the  sample  data  are 


taken  -from  the  Wei  bull,  the  Beta,  or  the  normal  distribution. 
In  general,  the  K-S  test  has  higher  power  than  the  others 
when  the  hypothesized  shape  parameter  is  1.0.  When  the  shape 
parameter  is  3.5,  the  C-VM  test  tends  to  be  more  power-ful. 
Next  to  the  Chi-square,  the  A— D  test  appears  to  have  the 
lowest  power  in  most  cases. 

3.  Successful  completion  of  the  third  research 
objective  has  revealed  a  strong  linear  relationship  between 
shape  parameters  and  critical  values  for  the  K-S  and  C-VM 
tests.  Linear  coefficients  and  correlation  values  are 
contained  in  Tables  XI  and  XII.  However,  no  consistent 
functional  relationship  could  be  identified  for  the  A-D  test. 

Recommendat i ons 

Based  on  observations  made  during  the  investigation  for 
this  thesis,  the  fallowing  research  areas  are  proposed  for 
further  study* 

1.  Apply  the  techniques  used  in  this  thesis  to 
generate  modified  K-S,  A-D,  and  C-VM  tests  for  other 
distribution  functions. 

2.  Investigate  whether  other  types  of  goodness-of-f i t 
tests  can  be  modified  through  Monte  Carlo  techniques.  For 
example,  if  the  S  statistic  of  Mann,  Scheuer,  and  Fertig  (38) 
can  be  modified  for  the  Pareto  distribution,  a  power  study 
can  be  conducted  to  determine  whether  the  S  statistic  is  more 
powerful  than  the  K-S,  A-D,  or  C-VM  tests. 


3.  Derive  the  maxi  mum  likelihood  estimators  of 
location  and  scale  for  the  three-parameter  Pareto. 

4.  Compute  critical  values  for  sample  sizes  and  Pareto 
shape  parameters  not  specifically  included  in  Tables  VI-VIII. 
For  example,  the  tables  can  be  expanded  to  include  all  sample 
sizes  from  3  to  100  and  shape  parameters  from  0.25  to  10. 

5.  Increase  the  accuracy  of  the  critical  values  by 
using  various  techniques  (5:406-442)  of  experimental  design 
(e.g.,  increased  repetitions,  multiple  batch  runs,  replica¬ 
tions,  antithetic  random  number  seeds,  analysis  of  variance, 
etc.)  to  reduce  the  inherent  uncertainty  and  to  determine  the 
amount  of  variance  involved. 

6.  Apply  more  sophisticated  regression  techniques  to 
determine  the  functional  relationship  between  Pareto  shape 
parameters  and  Anderson-Darl i ng  critical  values. 

7.  Apply  the  results  of  this  thesis  to  earlier  studies 
(Chapter  III)  involving  the  Pareto  distribution.  For 
example,  Berger  and  Mandelbrot’s  (7)  conclusion  that  the 
Pareto  can  be  used  to  model  errors  in  communications  circuits 
can  now  be  tested  for  goodness-of-f it. 

8.  Further  investigate  potential  applications  of  the 
Pareto  distribution  as  an  accurate  model  of  actual  phenomena. 
The  tests  developed  in  this  thesis  contribute  to  the  useful¬ 
ness  of  the  Pareto  distribution  which,  in  many  situations, 
should  be  considered  as  a  viable  model  when  simulating  or 
testing  the  underlying  distribution  of  a  given  population. 


Computer 

Program 

CRITVAL 


Subroutine 

PARDEV 


Subroutine 

PARDEV 


Subroutines 

BXVALS 

BL.CL.E2 

BLCGT2 


Subroutine 

HYPCDP 


Subroutine 

TESTAT 


Fig  6.  Procedure  for  Generating  Critical  Values 


n 


Main  Program 
DO  Loop  60 


Subrout in* 
CRTVAL 


Main  Program 
DO  Loop  80 


Main  Program 
DO  Loop  90 


Fig  6  (Continued) 


Procedure  -for  Generating  Critical  Values 


c*****  Classroom  Suoport  Computer  (CSC)  -  VAX  11/795  -  VMS  4.1  **** 
c 

c******  CRITVAL  PROGRAM  FOR  PARETO  GOODNESS-OF-FIT  TESTS  ****** 
c 

C****************************************************************** 

C****************************************************************** 

c**  ** 

c**  BEGIN  CRITVAL  MAIN  PROGRAM  ** 

c**  ** 

c****************************************************************** 

c 

c  Ref:  Appendix  A,  Figure  b. 
c 

C:3S33scs8mia3n:sssrssss:::s3s:::sxss»a»sn:3sansess»sE:s:s> 


Purpose! 


1.  Generate  critical  value  tables  -for  the  modified 
Kolmogorov-Smirnov  (K-S),  Anderson-Darl ing  (A-D), 
and  Cramer-von  Mises  (C-VM)  tests  for  the  three- 
parameter  Pareto  distribution  when  location  and 
scale  parameters  must  be  estimated  from  sample  data. 

2.  Provide  extensive  commentary  to  help  novice  prog¬ 
rammers  develop  similar  goodness-of-f it  programs. 
Thus,  diagnostic  print  routines  have  been  retained  as 
part  of  the  commentary  rather  than  deleted. 


E=333s=ss3s:s3=rs=asss===ssss:=: 


Variables; 

dseed 


dseed  =  random  number  seed 
c  =  shape  parameter 
n  =  sample  size 

nshp  =  shape  parameter  counter  (S  different  values) 
nsiz  =  sample  size  counter  (6  different  values  of  n) 
noct  =  percentile  counter  (5  different  percentiles) 
nst  =  number  of  test  statistics  to  be  used 
it  =  iteration  counter  (5000  repetitions  required) 

KS  *  array  of  values  of  modified  K-5  test  statistic 

CVM  =  array  of  values  of  modified  C-VM  test  statistic 

AD  =  array  of  values  of  modified  A-D  test  statistic 

alpha  =  level  of  significance 


Input: 

nst  *  number  of  repetitions  (input  at  computer  terminal) 
dseed  =  random  number  seed  (input  at  computer  terminal) 


50 

si 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

• 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

op 

100 

101 


c 

c  Subroutines: 
c 

c  PARDEV  -  Generates  n  ordered  Pareto  deviates 

c  BXVALS  -  Calculates  B  values  and  summations  of  B  and  Bx 

c  BLCLE2  -  Finds  BLUEs  for  location  and  scale  when  c  <=  2 

c  BLCGT2  -  Finds  BLUEs  for  location  and  scale  when  c  >  2 

c  HYPCDF  -  Computes  the  Hypothesised  Pareto  CDF 

c  TESTAT  -  Calculates  the  K-S,  A-D,  and  C-VM  test  statistics 

c  CRTVAL  -  Determines  critical  values  from  plotting  positions 

c 

c 

c  Calculate: 
c 

c  nc  =  n  !  c 

c 

c  Plotting  Positions  (Eon  51): 

c 

c  Y(i)  ■  (i  -  0.3) /(nst  +  0.4)  for  i  =  1 . . . . , nst (=5000) 

c 

c 

c  Output: 
c 

c  KScrit  =  3-D  array  of  critical  values  for  modified  K-S  test 

c  ADcrit  *  3-D  array  of  critical  values  for  modified  A-D  test 

c  CVcrit  *  3-D  array  of  critical  values  for  modified  C-VM  test 

c 

c 

c  Declare  Variables: 
c 

common  dseed. x,n,c,nc,B,D, ablu, bblu.P, pet, Bsuml , Bxsuml , 

1  Bx  sum2 , Bx  sm2c , KS , AD , CVM ,it,nsiz,nshp,npct,nst, 

1  KScrit, ADcrit, CVcrit. Y 

integer  n.nsiz.nshp, it.npct.nst 

real  x (30) ,  ablu, bblu, B (30) , D, KS (5000, 6. 8) , AD (5000. 6,8) , 

1  CVM (5000, 6, 8) , c, nc, Bsuml , Bxsuml , Bxsum2, Bxsm2c,P (30) , 

1  KScrit (6, 8,5), ADcrit (6, 8,5), CVcrit (6, 8, 5) . r (30) . pet, 

1  Y (5002) .alpha 

double  precisian  dseed 
c 

c  *!  Open  Output  Files  to  Store  Computed  Critical  Values:  !! 

open  <unit=7,f ile=’CRIT’ , status=’ new' ) 
c 

c  !!  Number  of  Test  Statistics  to  be  Used  on  Each  Run:  4* 
orint!, 'The  Monte  Carlo  analysis  will  require’ 
print*, ’  5000  test  statistics.' 

print!, ’Enter  the  number  to  be  used  for  this  run:’ 
read!, nst 
c 


l  .  A 


ra 


'  1 


i.j 

■  * 


A-5 


■H  , 


**  Calculate  5002  Plotting  Positions  on  the  Y-axis: 

Y (0)  =  0.0 

do  10  i  =  l.nst 

Y(i)  =  (i  -  0.3)/(nst  +  0.4) 
continue 

Y  (nst  +  1)  =  1.0 

print*.’  ’ 

print*. ’SELECTED  MEDIAN  RANKS  PLOTTING  POSITIONS’ 
print*,’  TO  BE  USED  TO  FIND  CRITICAL  VALUES:’ 


print*. ’  ’ 


print*. 

Y (5001 ) 

= 

’ ,Y(5001) 

print*. 

Y (5000) 

S 

’ , Y<5000) 

print*. 

’ 99PCT : 

Y (4950) 

s 

’ , Y (4950) 

print*, 

’ 95PCT: 

Y(4750> 

= 

’ . Y (4750) 

print*. 

’ 90PCT : 

Y (4500) 

’ ,Y(4500) 

print*. 

’ 85PCT : 

Y (4250) 

= 

’ ,Y(4250) 

print*. 

’80PCT; 

Y (4000) 

= 

’ ,Y(4000) 

print*, 

Y (0001) 

= 

’ .  Y  <  1 ) 

print*. 

Y  (0000) 

= 

’ ,Y(0> 

**  Plotting  Positions  Computation  Complete  ** 

print*, ’Enter  random  number  seed  or  "1."  fo  r  de-fault:’ 
read*. dseed 

if  (dseed  .eq.  1.)  dseed  *  123457. dOO 
print*,’  ’ 

print*, ’STANDBY  .  .  .  COMPUTATIONS  IN  PROGRESS’ 
nshp  *  0 

Begin  DO  Loop  90  for  Shape  Parameter  Values  c  =  .5(.5)4 

do  90  shape  =  0.5, 4.0,. 5 
c  =  shape 
nshp  =  nshp  +  1 

Write  Headings  for  Output  Data: 
write<7,52> 
write<7,51) 
write (7, 52) 
write (7, 54) 
write<7,52) 
write(7,56) 

nsiz  =  0 

-  Begin  DO  Loop  30  for  Sample  Sizes  n  =  5(5)30  - 


do  90  nsamo  =  5.30.5 


v  '>.■»  V  l-r^ir*'  t"»  y*  t*..v  ■  j.  «  y  » '.-we  »-  'atj  •’.»'  v  r.yrj 


if  (  <c.eq.0.5)  .and.  (nsamp.eq.5)  )  then 
the  BLUEs  do  not  exist,  so  we  must  let: 

n  58  6 
else 

n  =  nsamp 
end  if 

nsi :  -  nsiz  +  1 
nc  =  n  <  c 

write<7,58) 

-  Begin  DO  Loop  60  for  5000  Iterations  - 

do  60  it  =  l.nst 

**  Perform  Steps  1  &  2  of  Fig  6:  ** 
call  PARDEV 

**  Perform  Step  3  of  Figure  6:  ** 

call  BXVALS 

if  (c  .le.  2.0)  then 
call  BLCLE2 

else 

call  BLCGT2 
end  if 

**  Perform  Step  4  of  Figure  6:  ** 
call  HYPCDF 

tt  Perform  Step  5  of  Figure  6:  ** 
call  TESTA! 
continue 

-  End  DO  Loop  60  for  5000  Iterations  - 

**  Completes  Step  6  of  Figure  6  ** 

i*  Perform  Step  7  of  Figure  6:  tt 

-  Begin  DO  Loop  70  for  Percentiles  - 

do  70  npct  =  1,5 


mZ 

l 


I 


(r 


l 


Jr 

i 


i 


i 


IT 

L 


[ 


l 


call  CRTVAL 


--  Write  CRTVAL  Output  to  File  — 
write <7, 62) , 1 . -pet ,n,c, KScrit (nsiz, nshp, npet) , 
ADcri t (nsi z , nshp, npet ) , CVeri t (nsi z , nshp, npet) 

print*.’  ’ 

print*,’  CRITICAL  VALUES  FROM  MAIN  PROGRAM’ 
print*,’  pet  ®’,pct,’  n=’,n,’  **  c=’,c 
print*,’  K-S  , KScrit(nsiz, nshp, npet) , 

’  A-D  a’ , ADcri t (nsi z , nshp, npet) , 

’  CVM  *’ .CVerit <nsiz, nshp, npet) 
print*,’  ’ 


continue 


End  DO  Loop  70  ■for  Percentiles 


continue 


-  End  DO  Loop  80  -for  Semple  Sizes  n  =  5(5)30  - 

**  Completes  Step  8  of  Figure  6  ** 


90  continue 


c  -  End  DO  Loop  90  for  Shape  Parameter  Values  c  =  .5 (.5) 4  - 

c  **  Completes  Step  9  of  Figure  6  ** 

c 

C*f ************************************************************* 

c 

c  OUTPUT  INSTRUCTIONS:  The  remainder  of  the  main  program 
c  consists  of  commands  to  format  the  output  data  and  write 

c  the  data  and  headers  to  a  file  which  can  be  printed  out 

c  in  hardcopy, 
c 

C*************************************************************** 

c 

c  ***  Write  KS  Critical  Value  Tables  to  File  by  Alpha  Level:  *** 
c 

write(7, 52) 
write(7, 130) 
write(7, 52) 
write<7, 132) 
write(7,52) 
write (7, 200) 
write<7, 201) 
write<7,52> 


c - Begin  DO  Loop  105  to  Sort  Critical  Values  bv  Alpha  Level - 


do  105  npet  =  1,5 


•T-  W 


if  (npct  .ne.  5)  alpha  =  .25  -  (.OStnpct) 
if  (npct  .eq.  5)  alpha  -  .01 
c 

nsiz  *  0 
n  =  0 
c 

c  -  Begin  DO  Loop  107  to  Sort  Output  by  Sample  Size  - 

c 

do  107  nsiz  =  1,6 
c 

n  *  5  t  nsiz 


Write<7, 120), alpha,n,  KScr it (nsiz, l,npct),KScrit 
1  (nsiz, 2, npct) , KScr it (nsiz ,3, npct) ,KScrit (nsiz, 

1  4, npct ) , KScr i t (nsi z , 5, npct ) , KScr i t (nsi z , 6, npct  > , 

1  KScrit (nsiz, 7, npct) ,KScrit (nsiz, 8, npct) 

c 

107  continue 
c 

c  -  End  DO  Loop  107  After  Sorting  by  Sample  Size  - 

c 

write(7,201) 

c 

105  continue 
c 

c  -  End  DO  Looo  105  After  Sorting  Output  by  Alpha  Level  - 

c 

c 

c  ***  Write  AD  Critical  Value  Tables  to  File  by  Alpha  Level:  *** 
c 

write(7, 52) 
write(7, 140) 
write(7, 52) 
write(7, 142) 
write(7,52) 
write(7, 200) 
write(7,201) 
write (7, 52) 
c 

npct  *  0 
c 

c - Begin  DO  Loop  115  to  Sort  Critical  Values  by  Alpha  Level - 

c 

do  115  npct  *  1,5 
c 

if  (npct  .ne.  5)  alpha  *  .25  -  (.05*npct) 
if  (npct  .eq.  5)  alpha  -  .01 


c 


nsiz  a  0 


>  I,. 


-  Begin  DO  Loop  117  to  Sort  Output  by  Samole  Size  - 

do  117  n«iz  *  1,6 
n  *  5  t  nsiz 

Write(7, 120) .alpha, n,ADcrit(nsiz, l,npct> .ADcrit 
1  (nsiz, 2, npct) . ADcrit (nsiz,3,npct) ,ADcrit (nsiz, 

1  4, npct) .ADcrit (nsiz, 5, npct) .ADcrit (nsiz. 6, npct) , 

1  ADcrit (nsiz, 7, npct) .ADcrit (nsiz, 8, npct) 

117  continue 

-  End  DO  Loop  117  After  Sorting  by  Sample  Size  - 

write(7,201) 

115  continue 

: -  End  DO  Loop  115  A'fter  Sorting  Output  by  Alpha  Level  - 


c  ft*  Write  CVM  Critical  Value  Tables  to  File  by  Alpha  Level  *** 
c 

write(7,52) 
write (7, 150) 
write(7,52) 
write(7, 152) 
write(7, 32) 
write(7,200) 
write (7, 201) 
write(7,52) 


c - Begin  DO  Loop  125  to  Sort  Critical  Values  by  Alpha  Level  — 

c 

do  125  npct  *  1,5 
c 

if  (npct  .ne.  5)  alpha  =  .25  -  (.OStnpct) 
if  (npct  ,eq.  5)  alpha  =*  .01 
c 

nsiz  *  0 
n  a  0 
c 

c  -  Begin  DO  Loop  127  to  Sort  Output  by  Sample  Size  - 

c 

do  127  nsiz  *  1,6 


n  =  5  t  nsiz 


361 

362 

363 

364 

365 

366 

367 

368 

369 

370 

371 

372 

373 

374 

375 

376 

377 

378 

379 

380 

381 

382 

383 

384 

385 

386 

387 

388 

389 

390 

391 

392 

393 

394 

395 

396 

397 

398 

399 

400 

401 


WritB<7, 120) , alpha, n,CVcrit<nsis, l,npct> ,CVcrit 
1  (nsi  z ,  2,  npct ) ,  CVcri  t  (nsiz ,  3,  npct) ,  CVcri t  (nsi  z , 

1  4, npct) ,CVcrit  <nsiz,5,npct) ,CVcrit (nsiz, 6, npct) , 

1  CVcrit(nsiz,7,npct) ,CVcrit(nsiz,8,npct) 

c 

127  continue 
c 

c  -  End  DO  Loop  127  After  Sorting  by  Sample  Size  - 

c 

write (7, 201) 
c 

125  continue 
c 

c  -  End  DO  Loop  125  After  Sorting  Output  by  Alpha  Level  - 

c 

c  Specify  Format  for  Hardcopy  Output  Data  and  Headers: 

c 

51  format*’  tttttttttttttttttttttttttttttttttttttttttttttt’ ) 

52  format ('  ') 

54  format*’  tt  PARETO  CRITICAL  VALUES  FOR  SHAPE  C  *  **’ ) 

56  formate  alpha’ ,3X, ’n’ ,4X, ’c’ ,7X, ’KS’ ,8X, ’AD’ ,8X. ’CVM’ ) 

58  format  (’  - ’) 

62  format*’  ’ ,T3,F3.2, 15, F6. 1.3F10. 4) 

120  format*’  ’ , T3, F3. 2, 13, F8. 3, 7F9. 3) 

130  format*’ l’,36X, ’Table  VI’) 

132  format (20X, ’ CRITICAL  VALUES  FOR  THE  MODIFIED  K-S  TEST') 

140  format  * ’ 1 ' , 36X, ’Table  VII’) 

142  format*20X, ’CRITICAL  VALUES  FOR  THE  MODIFIED  A-D  TEST’) 

150  format  < ’ 1 ’ ,35X, ’Table  VIII’) 

152  format <19X,’ CRITICAL  VALUES  FOR  THE  MODIFIED  C-VM  TEST’) 

200  format*’  alpha’ ,3X, ’n’ , 4X, ’c=. 5’ ,5X, ’ 1 .0’ ,6X. ’ 1 .5’ ,6X, 

1  ’2.0’,6X,’2.5’,6X,’3.0’,6X,’3.5’,6X,’4.0’ ) 

201  f ormat (81 <’-’)) 
c 

close(7) 

c 

end 

c 

Cnn::xs:n:33s::css;:sss«::sss::ss:asa3s:as:«33:nasss>s:n3 

c  END  MAIN  PROGRAM 


Subroutine  PARDEV 

cmmmttmumummtttmmmuttmmmtmmmtm 

cl*  ** 

cl*  BEGIN  SUBROUTINE  PARDEV  I* 

cl*  II 

c!********!*t*t**t*ttl****tt!*t!*ttt!*tt*****t**t**t!**t*ttt*t*t*t* 

c 

c  Ref:  Appendix  A,  Fig  6,  Steps  1  &  2. 
c 

Q3iass»sanH8«3a38ssiuaaBS3uain8>8Baanasaat»9isnana3C8«a8S 

C 

c  Purpose:  For  a  specified  sample  size  n,  generate  n  random 

c  deviates  from  a  Pareto  distribution  with  location  and 

c  scale  parameters  set  to  one  (a  ■  b  *  1)  and  the  shape 

c  parameter  c  set  to  some  specified  positive  value. 


c  Variables: 

c  r  »  array  containing  n  random  numbers 

c  c  =  shape  parameter 

c  x  »  array  containing  n  Pareto  deviates 

c  n  *  sample  size 

c  dseed  ■  random  number  seed 

c 

^mnnssanmEimxinsaunuinaaiuniuiinnnsmxmstn 

C 

c  Input:  dseed  »  random  number  seed  (from  MAIN  program) 

c  c  *  shape  parameter  *  .5 (.5) 4  (MAIN  DO  Loop  90) 

c  n  ■  sample  size  ■  5(3)30  (MAIN  DO  Loop  80) 

c 

^sBEsannnuninnnniniuinitiiunuuiaiiiuinxiauxxnia 

C 

c  IMSL  Subroutines: 
c 

c  GGUBS  -  generates  random  numbers  uniformly  distributed  on  (0,1) 
c  VSRTA  -  arranges  a  set  of  numbers  in  ascending  order 
c 

C=»"snx>»mu3nnu«Nnsm»isnxsasnE33xusni»>Bi»xa3« 

C 

c  Calculate: 
c 

c  x(j)  *  (l/r(j))  I*  (1/C)  for  j  ■  1,2, ...,n  (from  eqn  48) 
c 

CaaaNaiBsaMstssMMaBtssaiaassaassaiasaaaasssssasaasssscsssassasB 


c  Output: 


x  =  array  of  n  ordered  Pareto  deviates 


Declare  Variables: 


A- 12 


454 

common  dseed, x,n,c, nc,B,D,ablu, bblu.P.pct.Bsuml .Bxsuml, 

455 

l 

Bx  sum2 , Bx  sm2c , KS . AD , CVM , i t , ns i 2 ,  nshp ,  npc t ,  nst , 

456 

l 

KScrit, ADcrit.CVcrit, Y 

457 

real  x (30) . ablu, bblu.B (30) , D, KS (5000, 6, 8) , AD (5000, 6, 8) , 

458 

l 

CVM (5000, 6, 8) ,c,nc, Bsuml , Bxsuml, Bxsum2,Bxsffl2c,P (30) , 

459 

l 

r  (30) ,  KScrit  (6, 8, 5) ,  ADcrit  (6, 8, 5) ,  CVcrit  (6,  8, 5) , 

460 

l 

Y (5002) 

461 

integer  n.npct 

462 

double  precision  dseed 

463 

c 

464 

c - 

Begin  DO  Loop  10  to  Generate  n  Random  Pareto  Deviates  - 

465 

c 

466 

do  10  j  *  l,n 

467 

c 

468 

c 

Use  IMSL  subroutine  to  generate  random  numbers: 

469 

call  GGUBS (dseed ,n,r> 

470 

c 

471 

c 

Use  eqn  48  to  transform  them  to  Pareto  deviates: 

472 

x(j)  =  < 1 .0/r ( j ) ) tt  (1 . 0/c) 

473 

c 

474 

10 

continue 

475 

c 

476 

c - 

End  DO  Loop  10  after  Generating  n  Random  Deviates  - 

477 

c 

tt  (Completes  Step  1  of  Figure  6)  tt 

478 

c 

479 

c  Use 

IMSL  subroutine  to  place  the  deviates  in  ascending  order: 

480 

call  vsrta(x.n) 

481 

c 

tt  (Completes  Step  2  of  Figure  6)  tt 

482 

c 

483 

return 

484 

end 

485 

AOA 

c 

487 

488 

c  END  SUBROUTINE  PARDEV 

ctmmmmmmmmmmmmmmtmmmmtmmm 

Subroutine  BXVALS 

ctttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt 

ctt  ** 

c**  BEGIN  SUBROUTINE  BXVALS  tt 

ctt  tt 

ctttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt 

c 

c  Ref:  Appendix  A,  Fig.  &,  Step  3. 


Purpose!  For  a  given  sample  size  n,  calculate  the  B  values 
used  to  find  the  BLUEs  a f  location  and  scale.  Alsc 
find  the  sum  of  the  first  n-1  values  of  B(i).  Ther 
compute  the  three  values  equal  to  the  sums  of  the 
first  n-1,  the  first  n-2,  and  (for  c  *  .5,  1,  or  2) 
the  first  n  -2/c  values  of  B(i)x(i). 


c  Variables:  c  3  shape  parameter 

c  n  3  sample  size 

c  x  *  array  containing  n  ordered  Pareto  deviates 

c  B  =  array  containing  n  values  of  B 

c  Bsuml  3  sum  of  B(i)  values  for  i  3  1,2, . . . , (n-1) 

c  Bxsuml  3  sum  of  B(i)x(i)  for  i  3  1,2, . . . , (n-1) 

c  Bxsum2  3  sum  of  B(i)x(i>  for  i  3  1,2, . . . , (n-2) 

c  Bxsm2c  3  sum  of  B(i)x(i)  for  i  3  1,2, . . . , (n-2/c) 

c 

C&aaKnsxsasssssnnsBsssRssasssasinssssssasssnassssssswn 

C 

c  Input:  c  3  shape  parameter  3  .5 (.5) 4  (from  MAIN  DO  Loop  90) 

c  n  3  sample  size  3  5(5)30  (from  MAIN  DO  Loop  80) 

c  nc  3  n*c  (from  MAIN  program) 

c  x  3  ordered  Pareto  deviates  (from  PARDEV) 


c  Calculate: 


B(i)  3  Cl  -  2/c(n-i+l) 3  *  B(i-l) 


(eqn  29) 


Bsuml  3  B(l)  +  B(2)  +  ...  +  B(n-l) 


Bxsuml  3  B(l)tx(l)  +  ...  +  B(n-l) *x (n-1) 


Bxsum2  3  B(l)Sx(l>  +  ...  +  B(n-2) *x (n-2) 


Bxsm2c  3  B(l)*x(l)  + 


•  ■  e 


+  B (n-2/c) *x (n-2/c) 


c  Output: 

c  B  *  array  containing  n  values  of  B 

c  Bsuml  =  sum  of  -first  <n-l>  B  values 

c  Bxsuml  =  sum  o-f  first  (n-1)  Btx  values 

c  Bxsum2  =  sum  o-f  first  <n-2)  Btx  values 

c  Bxsm2c  *  sum  of  first  (n-2/c)  Btx  (if  2/c  is  integer) 


c  Declare  Variables: 
c 

common  dseed, x ,  n,  c,  nc,  B,  D, ablu, bbl  u, P, pet , Bsuml , Bxsuml , 

1  Bxsum2, Bxsm2c , KS, AD, CVM, i t , nsis , nshp , npet , nst , 

1  KScrit, ADcrit,CVcrit, Y 

real  x (30) , ablu, bblu, B (30) , D, KS (3000,6,8) , AD (5000,6,8) , 

1  CVM ( 3000 ,6, 8), c,nc, Bsuml, Bxsuml, Bx  sum2 , Bx  sm2c , P ( 30 ) 

1  KScrit (6,8, 5) , ADcri t (6,8,5), CVcr i t (6, 8, 5) , Y (5002) 

integer  n 

double  precision  dseed 
c 

c  Calculate  the  first  B  value  (eqn  25): 
c 

B(l)  =  1.0  -  2.0/nc 
c 

c  -  Begin  DO  Loop  10  to  Find  the  2nd  thru  nth  B  values  - 

c 

do  10  j  *  2,n 

B(j)  =  B(j-l)  t  <1.0  -  (2.0/ (ct(n-j+l) > ) ) 

10  continue 
c 

c  -  End  DO  Loop  10  - 

c 

Bsuml  *  0 
c 

c  -  Begin  DO  Loop  20  to  Sum  the  First  n-1  Values  of  B  - 

c 

da  20  k*l , (n-1) 

Bsuml  =  Bsuml  +  B(k) 

20  continue 

c 

c  -  End  DO  Loop  20  - 

c 

Bxsuml  *  0 
c 

c  -  Begin  DO  Loop  30  to  Sum  the  First  n-1  Values  of  Bx  - 

c 

do  30  1*1, (n-1) 

Bxsuml  =  Bxsuml  +  (B(l)tx(l)) 

30  continue 


c 

c 


-  End  DO  Loop  30  - 


Bxsum2  *  Bxsuml  -  (B(n-l)tx (n-1) ) 


c 

c  -  Find  Bxsm2c  When  2/c  is  an  Integer  (c=.5,  1,  or  2)  - 

c 

Bxsm2c  =  0 
c 

if  (c  .eq.  1.0)  then 
Bxsm2c  =  Bxsum2 
else  if  (c  .eq.  2.0)  then 
Bxsm2c  *  Bxsuml 
else  if  (c  .eq.  0.5)  then 

Bxsm2c  *  Bxsu<n2  -  <B  <n-3>  tx  (n-3)  >  -  (B(n-2) *x (n-2) ) 
end  if 
c 

return 

end 

c 

C=:s3sa=aa===sssss::ss===s3&aas=assszsssssss==sss3zas=33ai3s3sssss 

c  END  SUBROUTINE  BXVALS 


Subroutine  BLCLE2 

citmmtttmmmtutmmtmmmmtmmmmumum 

c**  ** 

ct*  BEGIN  SUBROUTINE  BLCLE2  ** 

ctt  tt 

c 

c  Ref:  Appendix  A,  Figure  6,  Step  3. 


Purpose:  Given  an  ordered  sample  of  size  n  and  specified  shape 
c<=2,  calculate  the  BLUEs  of  location  a  and  scale  b. 


Variables: 


B 
nc 
Coef  1 
Coef  2 
Coef  3 
Bxsum2 
B::sm2c 
ablu 
bblu 
U 

Ter  mi 


array  containing  n  ordered  Pareto  deviates 
shape  parameter 
sample  size 

array  of  B  values  used  to  calculate  the  BLUEs 
product  of  n  and  c 

coefficient  used  to  compute  BLUE  of  location  a 
coefficient  used  to  compute  BLUE  of  location  a 
coefficient  used  to  compute  BLUE  of  scale  b 
sum  of  B(i)tx(i)  terms  for  i  =  l,...,n-2 
sum  of  B(i)$x(i>  terms  for  i  =  l,...,n-2/c 
BLUE  of  the  location  parameter  a 
BLUE  of  the  scale  parameter  b 
value  used  to  compute  BLUEs  when  c  *  1.5 
terms  used  to  compute  U  <i=l,2,3) 


Input:  x  =  array  of  n  ordered  Pareto  deviates  (from  PARDEV) 

c  =  shape  parameter  =  0.5,  1.0,  1.5,  or  2.0 
n  =  sample  size  *  5(5)30  (from  MAIN  DO  Loop  80) 
nc  =  n*c  (from  MAIN  program) 

B  =  array  containing  n  values  of  B  (from  BXVALS) 
Bxsum2  =  sum  of  first  n-2  values  of  B  (from  BXVALS) 

Bxsm2c  =  sum  of  first  n-2/c  values  of  B  (from  BXVALS) 

:as«na»a»ssnsuB3nsnsa:»s::s3S:snaa3BS3:sss3:ssisss:9ttss: 

Calculate  (if  c  =  0.5,  1,  or  2): 

Coef 1  =  C (c+1) *<c+2) 3  /  C(nc-2) * (nc-c-2) ] 

Coef 2  =  (nc-2)  /  <c+2) 

ablu  =  ;<  ( 1 )  -  Coef  1  *  [Bxsm2c  -  (Coef 2*x  ( 1 ) )  3  (eqn  34) 
bblu  =  <nc-l>  *  Cx  <  1  >  -  ablu]  (eqn  35) 


A— 17 


665 

667 

668 

669 

670 

671 

672 

673 

674 

675 

676 

677 
679 
681 
682 
683 
685 

687 

688 

689 

690 

691 

692 

693 

694 

695 

696 

697 

698 

699 

700 

701 

702 

703 

704 

705 

706 

707 

708 

709 

710 

711 

712 

713 

714 

715 

716 

717 

718 

719 

720 

721 


c  Calculate  (if  c  =  1.5): 
c 

c  Terml  =  (nc-2)  *  (nc-c-2) 

c  Term2  3  nc  t  (c-2)  %  B(n-l) 

c  Term3  3  (nc-1)  *  (c+2) 

c  Coef3  3  C(nc-l)/nc3  <  <nc-2-U> 

c  U  =  (Terml  -  Term2)  /  Term3  (eqn  39) 

c 

c  ablu  3  x(l)  -  bblu  /  (nc-1)  (eqn  37) 

c  bblu  3  (1/U)  *  C (c+l)*<8xsu«2)  +  <2c-l) *B(n-l ) *x (n-1) 

c  -  Coef3  *  x(l)3  (eqn  38) 


c  Output : 

c  ablu  3  BLUE  of  location  parameter  a 

c  bblu  3  BLUE  of  scale  parameter  b 


c  Declare  Variables: 
c 

common  dseed, x,n,c,nc,B,D, ablu, bblu, P, pet, Bsuml , Bxsuml , 

1  Bxsum2,Bxsm2c,KS, AD,CVM, it,nsi2,nshp,npct,nst, 

1  KScrit,ADcrit,CVcrit, Y 

integer  n 

real  x (30) , ablu, bblu, B (30) ,D,c,nc, Bsuml, Bxsuml, Bxsum2, 

1  Bx  sm2c ,P(30),Terml,Term2,Term3,Coefl, Coef  2 , Coef  3 , U , 

1  KScrit  (6, 8, 5) ,  ADcri t  (6, 8, 5) ,  CVcri t  (6, 8, 5) ,  Y  (5002) 

double  precision  dseed 
c 

if  ((c.eq.0.5)  .or.  (c.eq.1.0)  .or.  (c.eq.2.0))  then 
Coefl  *  ( <c+l .0) * (c+2.0> )  /  ( (nc-2. 0)* (nc-c-2. 0) ) 

Coef 2  3  (nc-2. 0)  /  (c+2.0) 
ablu  3  x  < 1 )  -  Coefl  I  <Bxsm2c  -  (Coef 2*x < 1 ) ) ) 
bblu  3  (nc-1.0)  *  (x(l)  -  ablu) 
c 

else  if  (c  . eq.  1.5)  then 

Terml  3  (nc-2. 0)  <  (nc-c-2. 0) 

Ter m2  3  nc  *  (c-2.0)  *  B(n-l) 

Term3  3  (nc-1.0)  *  <c+2.0) 

U  3  (Terml  -  Term2)  /  Term3 
Coef3  3  ( (nc-1 . 0) /nc)  *  (nc-2. 0-U) 
bblu  3  (1.0/U)  4 (  (c+1.0)  *  (Bxsum2) 

1  +  (2.0*c-1.0)*B(n-l>*x  <n-l)  -  Coef3  t  ::<1)  ) 

ablu  3  x(l)  -  (bblu  /  (nc-1.0)) 
c 

end  if 
c 

return 

end 

c 

c  END  SUBROUTINE  BLCLE2 

c«*****t***t*******t*t**»**t*****|:tt*t *******  ********************* 


Subroutine  BLCGT2 

cmmmmtmmmmmmmmmmmmmtmmmm 
dl  tt 
ctt  BEGIN  SUBROUTINE  BLCGT2  t * 
c*<  «* 
ct ********************************** tt*t ******* t***t«*t**t ******** 


l 

j- 


c  Ref:  Appendix  A,  Figure  6,  Step  3. 
c 

C»:sss:nsu:3:u<nui»>ss33nun>ua:»ssu»33:a3a«asai::: 

C 

c  Purpose:  Given  an  ordered  sample  of  size  n  and  a  specified 

c  shape  c  >  2,  calculate  the  best  linear  unbiased 

c  estimates  (BLUEs)  of  location  and  scale, 

c 

^333333«ZSa3a38aaa33»a»333333a333=S3333B3333=3S3333S3S>3333=3C3S 

C 

c  Variables:  x  -  array  containing  n  ordered  Pareto  deviates 
c  c  *  shape  parameter 

c  n»  sample  size 

c  nc  =  product  of  n  and  c 

c  B  *  array  of  B  values  used  to  calculate  the  BLUEs 

c  Bsuml  *  sum  of  B(i)  terms  for  i  *  l,...,n-l 

c  Bxsuml  «  sum  of  B(i)tx(i)  terms  for  i  =  i,...,n-l 

c  D  *  value  used  to  calculate  the  BLUEs 

c  YV  =  value  used  to  calculate  the  BLUEs 

c  ablu  »  BLUE  for  location  parameter  a 

c  bblu  *  BLUE  for  scale  parameter  b 


Input:  x  *  array  of  ordered  Pareto  deviates  (from  PARDEV) 

c  *  shape  parameter  =  2.5,  3.0,  3.5,  or  4.0 
n  =  sample  size  ®  5(5)30  (from  MAIN  DO  Loop  80) 
nc  a  n*c  (from  MAIN  Program) 

B  =  array  of  B  values  (from  BXVALS) 

Bsuml  *  gym  of  first  <n-l)  B  values  (from  BXVALS) 
Bxsuml  =  sum  of  first  n-1  B*x  values  (from  BXVALS) 


V.-  V-  J 
V-'J 


Calculate: 


D  =  C (c+1 )  I  Bsuml!  +  C(c-l)  t  B(n)] 


(eqn  21) 


YV  =  (c+1) IBxsuml  +  (c-1) *B(n) *x (n)  -  D*x(l)  (eqn  22) 


ablu  =  x ( 1 )  -  YV/t (nc-1 ) * (nc-2)  -  D*nc! 


( eqn  1 7 ) 


bblu  =  (nc-1)  *  C  x(l)  -  ablu  ] 


(eqn  18) 


A- 19 


*" y •  ■  *  WV 


774 

775 

776 

777 

770 

c 

c 

c 

c 

Output:  ablu  s  BLUE  for  location  a 

bblu  3  BLUE  for  scale  b 

779 

c 

780 

c 

Declare  Variables: 

781 

c 

782 

common  dseed, x, n, c, nc, B, D, ablu, bblu, P, pet, Bsuml, Bxsuml, 

783 

1 

Bx sum2, Bx  sm2c , KS , AD , CVM , i t , nsi z , nshp , npet , nst , 

784 

1 

KScrit, ADcrit, CVcrit, Y 

785 

integer  n 

786 

real  x (30) , ablu, bblu, B(30) ,D,KS<3000, 6,8) , AD (5000, 6, 8) , YV, 

787 

1 

CVM ( 5000 , 6 , 8 ) , c , n c , Bsum 1 , Bx  sum 1 , Bx  sum2 , Bx  sm2c , P ( 30 ) , 

788 

1 

KScrit  (6, 8, 5) ,  ADcrit  (6, 8, 5) , CVcrit  (6, 8, 5)  ,r  (30) , 

789 

1 

Y (5002) 

790 

double  precision  dseed 

791 

c 

792 

D  =  ( (c+1 . 0)  *  Bsuml)  +  ( (c-1.0)  *  B(n)) 

793 

YV  =  ( <c+1.0)*Bxsuml)  +  ( (c-1 . 0) *B (n ) *x (n) )  -  (D*x(l)) 

794 

ablu  =  x(l)  -  YV/ ( (nc-1 . 0) * (nc-2. 0)  -  (D*nc)) 

795 

bblu  =  (nc-1.0>  *  <>:<1)  -  ablu) 

796 

c 

797 

return 

798 

end 

799 

c 

800 

c= 

n 

H 

M 

II 

II 

3SlSSa3S5SS3KSSrsS3SSS33SS=SS333XSS3SS:sS=S3=SS3a:=SS53SSS 

801 

c 

END  SUBROUTINE  BLCGT2 

802 

c***************************************************************** 

•\  /  . 

t  *• 


t:-a 


A-20  .  ■.  ) 

•  ■  ■'■  '  .1 

t  4 


Subroutine  HYPCDF 

c***************************************************************** 
cl*  ** 

cl*  BEGIN  SUBROUTINE  HYPCDF  I* 

cl*  II 

c******************************************************** It ******* 
c 

c  Ref:  Appendix  A,  Figure  6,  Step  4. 


Purpose:  Given  an  ordered  sample  o-f  size  n,  a  specified 

shape  c,  and  the  BLUEs  of  location  a  and  scale  b 
compute  the  hypothesized  Pareto  distribution 
function  P( i)  for  i  =  1,2.... .n. 


Variables: 


array  containing  n  ordered  Pareto  deviates 

sample  size 

shape  parameter 

BLUE  of  location  a 

BLUE  of  scale  b 

array  containing  n  points  of  the 
hypothesized  Pareto  CDF 


Input: 


x  =  array  of  n  ordered  Pareto  deviates  (from  PARDEV) 
c  *  shape  parameter  =  .5 (.5) 4  (from  MAIN  DO  Loop  90) 
n  =  sample  size  =  5(5)30  (from  MAIN  DO  Loop  80) 
ablu  *  BLUE  of  location  a  (from  BLCLE2  or  BLCGT2) 
bblu  =  BLUE  of  scale  b  (from  BLCLE2  or  BLCGT2) 


c  Calculate 


Cl  /  Cl  +  ( x  ( i )  -  ablu) /bblu]  ]**c  <eqn  40) 


c  Output 


array  of  n  points  of  the  hypothesized  CDF 


c  Declare  Variables 


common  dseed.x,n,c,nc,B,D, ablu, bblu, P, pet, Bsuml, Bxsuml, 
Bxsum2,Bxsm2c,KS, AD,CVM, it,nsir ,nshp,npct,nst, 
KScrit, ADcrit.CVcrit, v 
integer  n 

real  x  (30) ,  ablu,  bbl  u,  B  (30) ,  D.  KS  (5000, 6, 8) .  AD  (5000, 6, 8) . 
CVM  <5000, 6, 8) , c, nc , Bsuml , Bxsuml , Bxsum2, Bxsm2c . P (30) , 
KScrit (6.8,  5) , ADcrit (6, 8,5), CVcrit (6, 8, 5) ,r (30) , 

Y (5002) 

double  precision  dseed 
do  10  i  =  l,n 

P(i )  *  1.0  -  (1.0  +  (x (i )  -  ablu) /bblu)  **  <-c) 
continue 

return 


c  END  SUBROUTINE  HYPCDF 


Subroutine  TESTAT 

c****************************** ************************************ 
c*t  ** 

c**  BEGIN  SUBROUTINE  TESTAT  ** 

c**  ** 

C****************************************************************** 

c 

c  Ref:  Appendix  A,  Figure  6,  Step  5. 
c 

CSSSSS=SS33:SSSSS:3BSSSSSSSS==SSS3SSSSS3=SS3aSSS=SSS3aSSSB32S3aSSai 

c  Purpose:  Given  a  sample  size  n,  and  the  hypothesized  Pareto 
c  distribution  -function  P(i),  compute  values  o-f  the 

c  test  statistics  o-f  the  modified  K-S,  A-D,  and  CVM 

c  goodness-of-f it  tests. 

CS33333a3sssass3ass=SBSS3B3s:=ss£sxss==5:=33su:xai:3S=:3x::==:=s=s 

c  Variables: 

c  n  *  sample  size 

c  nshp  *  shape  parameter  counter  (8  values,  1-8) 

c  nsiz  a  sample  size  counter  <6  values,  1-6) 

c  it  *  iteration  counter  (1-5000) 

c  P«  array  of  n  values  of  the  hypothesized  Pareto  CDF 

c  - 

c  DP  a  positive  differences  between  EDF  and  CDF  points 

c  DM  *  negative  differences  between  EDF  and  CDF  points 

c  DPLUS  =  maximum  positive  difference  (largest  DP  value) 

c  DMINUS  *  maximum  negative  difference  (largest  DM  value) 

c  KS  *  values  of  the  modified  K-S  test  statistic 

c  - 

c  AL  =  value  used  to  calculate  the  A-D  test  statistic 

c  AM  a  value  used  to  calculate  the  A-D  test  statistic 

c  AN  *  al  +  AM 

c  AAA  *  values  to  be  summed  for  A-D  test  statistic 

c  SAAA  *  sum  of  AAA  values 

c  AD  =  values  of  the  modified  A-D  test  statistic 

c  - 

c  ACV  =  sauared  quantities  in  the  C-VM  formula 

c  SACV  a  sum  of  the  ACV  values 

c  CVM  =  values  of  the  modified  C-VM  test  statistic 


Input: 


sample  size  *  5(5)30  (from  MAIN  DO  Loop  80) 
array  of  n  values  of  hypothesized  CDF  (from  HYPCDF) 
iteration  counter  (from  MAIN  Da  Loop  60) 
sample  size  counter  (from  MAIN  DO  Loop  80) 
shape  parameter  counter  (from  MAIN  DO  Loop  90) 


Calculations  for  K-S  test  statistic  (eqns  41  Si  42) : 

DP(i )  =  ABSC  (i/n)  -  P(i>  ] 

DM <i >  =  ABSC  P(i)  -  (i-l)/n  3 


DPLUS  =  max  C  DP < i )  3  for  i  =  l,2. 


A -23 


DMINUS  »  max  Z  DM(i  >  ]  for  i-1,2 
KS  *  max  (DPLUS, DMINUS) 


c 

Calculations  for 

A-D  test  statistic  (eqn  43): 

c 

AL  ( j) 

*  In  (P ( j > ) 

c 

AM(  j> 

=  In  (1  -  P(n+l-j> ) 

c 

AN(  j) 

*  AL ( j )  +  AM(j) 

c 

AAA(j) 

=  (2*j  -  1)  t  AN( j) 

c 

SAAA 

=  AAA<1)  +  AAA (2)  +  ...  +  AAA(n) 

c 

c 

AD 

*  -n  -  (1/n)  *  SAAA 

Calculations  far  C-VM  test  statistic  (eqn  44): 

ACV(k)  *  C  P(k)  -  (2*k  -  l)/(2*n)  3**2 
SACV  *  ACV(l)  +  ACV (2)  +  ...  +  ACV(n) 

CVM  =  (l/(12*n))  +  SACV 


:S=rSS=S33SSS=SS==5S=3SSSSSS=S3S==SSSSSS=SSSSS=SS=SSS 


Declare  Variables: 

common  dseed, x , n, c, nc, B, D, ablu, bblu, P, pet, Bsuml , Bxsuml , 

1  Bxsum2. Bxsm2c , KS, AD , CVM, i t , nsi 2 , nshp , npet , nst , 

1  KScrit, ADcrit,CVcrit, Y 

integer  n,nsiz,nshp, it 

real  x  (30) , ablu,  bblu, B (30) , D,  KS (5000, 6, S) ,  AD (5000, 6, S) , 

1  CVM ( 3000 , 6 , 8 ) , c , nc , Bsuml , Bx  sum 1 , Bx  sum2 , Bx sm2c , P ( 30 ) 

1  KScrit (6,3,3) , ADcri t (6, 8, 5) , CVcrit (6, 8, 5) ,r  (30) , 

1  DP (30) , DM (30) .DPLUS, DMINUS, AL (30) , AM (30) , 

1  AN ( 30 ) , A AA ( 30 ) , S A A A , ACV ( 30 ) , S ACV , Y ( 5002 ) 

double  precision  dseed 

DPLUS  =  0 
DMINUS  =  0 

do  5  ik  »  1,30 
DP(ik)  =  0 
DM(ik)  *  0 
5  continue 

-  Compute  the  K-S  Test  Statistic  (eqns  41  ?<  42):  - 


Subroutine  CRTVAL 

ctmmmmmmmmmmmmtmmtmmmmmmm 

cu  tt 

ctt  BEGIN  SUBROUTINE  CRTVAL  ** 

cl*  t* 

c 

c  Ref:  Appendix  A,  Figure  6,  Step  7. 
c 

CS3s»a3K3szB3S3S3as:s3iss3sass:s3is:s:nKa3ss::saasn3ss3:assa» 

C 

c  Purposes 
c 

c  Given  a  set  of  5000  values  of  test  statistics  from  the 

c  modified  Kolmogorov-Smirnov  (K-S>,  Anderson-Darling  (A-D) , 

c  or  Cramer-von  Mises  (C-VM)  test,  select  critical  values 

c  by  using  median  ranks  plotting  positions  to  compute 

c  specified  percentile  levels, 

c 


Variables: 


nst 

it 

KS 

KS1 

CVM 

CV1 

AD 

ADI 

STAT 

KScrit 

CVMcrit 

ADcrit 

CRIT 

Y 

slpm 

bi 


shape  parameter 
sample  size 
percentile  value 

shape  parameter  counter  (Is  c=.5s  2s  c*1.0: 

3s  c*1.5;  4:  c*2.0s  5s  c=2.5;  6:  c=3.0; 

7s  c*3.5:  8s  c*4.0) 

sample  size  counter  (Is  n=5  or  6;  2s  n=I0; 

3s  n*15s  4s  n=20;  5i  n*25;  6:  n=30) 
percentile  counter  <0:  pct=0;  1:  pet*. 80; 

2:  pet*. 85;  3:  pet*. 9;  4:  pet*. 95;  5:  pet*. 99) 
total  number  of  statistics  used 
iteration  counter  (5000  repetitions  required) 
3D  array  of  5000  modified  K-S  test  statistics 
ID  array  of  5000  K-S  test  statistics 
3D  array  of  5000  modified  C-VM  test  statistics 
ID  array  of  5000  C-VM  test  statistics 
3D  array  of  5000  modified  A-D  test  statistics 
ID  array  of  5000  A-D  statistics 
ID  array  of  test  stats  (KS,  AD,  or  CVM) 
array  of  critical  values  for  the  K-S  test 
array  of  critical  values  for  the  C-VM  test 
array  of  critical  values  for  the  A-D  test 
either  the  KS,  AD,  or  CVM  critical  value  array 
array  containing  5002  plotting  positions 
array  of  slooes  used  to  find  critical  values 
array  of  intercepts  used  to  find  critical  vals 


Inputs 


A— 26 


*  .J 

*  ir  ip  i 


array  of  platting  positions  (MAIN  DO  Loop  10) 
shape  parameter  (-from  MAIN  DO  Loop  90) 
sample  size  (from  MAIN  DO  Loop  80) 
shape  parameter  counter  (from  MAIN  DO  Loop  90) 
sample  size  counter  (from  MAIN  DO  Loop  80) 
percentile  counter  (from  MAIN  DO  Loop  70) 
number  of  test  statistics  used  (from  MAIN  Prog) 
array  of  5000  K-S  test  statistics  (from  TESTAT) 
array  of  5000  C-VM  test  stats  (from  TESTAT) 
array  of  5000  A-D  test  statistics  (from  TESTAT) 


IMSL  Subroutine:  VSRTA  -  orders  the  test  statistic  values 


Calculate  Endpoints  of  Test  Statistics  (Eqns  52  -  57): 

slpm(O)  =  <  Y (2)  -  Y ( 1 )  )  /  (  STAT (2)  -  STAT(l)  ) 
bi (0)  *  Y<1)  -  slpm<0)  *  STAT(l) 

STAT(O)  *  max  (  0,  -  bi (0) /slpm(O)  ) 

slpm<6)  »  (Y  (5000)  -  Y(4999) ) / (STAT(5000)  -  STATU999)) 
bi (6)  =  Y (4999)  -  slpm<6)  *  STAT<4999> 

STAT (6)  *  (1.0  -  bi (6) )  /  slpm(fe) 


Calculate  Critical  Values  (Eqns  58  -  60): 

slpm(npct)  =»  (  Y(j+1)  -  Y(j)  )  /  (  STAT(j+l> 
bi(npct)  *  Y < j )  -  slpm(npct)  *  STAT(j) 
CRIT(npct)  =»  (pet  -  bi(npct))  /  slpm(npct) 


P  .  a 


H  .4 


e  *  ‘ 

[  ,  nM 

1  ■  L  ■  U  • 


."•V- 


1  .  J 


-  STAT(j)  ) 


tLi 


Output: 

KScrit  -  array  of  critical  values  for  modified  K-S  test 

ADcrit  -  array  of  critical  values  for  modified  A-D  test 

CVcrit  -  array  of  critical  values  for  modified  C-VM  test 


Declare  Variables: 

common  dseed , x , n , c , nc . 3 . D, ab 1 u , bbl u , P, pet , Bsuml , Bx sum 1 , 
1  Bx  sum2 , Bx  sm2c , KS , AD , CVM , i t , nsi z , nshp , npet , nst , 

1  KScrit, ADcrit, CVcrit, Y 

integer  n, nsi z, nshp, it,npct,nst,ntest 

real  x (30) , ablu,bblu, B(30) ,D,KS(5000, 6, 8) , AD (5000, 6, 8) , 


A-27 


L_J 


t 


'r^ 


f 


1140 

1  CVM (5000, 6, 8) , c, nc,  Bsuml , Bxsuml .  Bxsum2, Bxsm2c, P(30) 

1141 

1  KScrit <6,8,5) .ADcrit (6.8,5) .CVcrit (6,8,5) ,r (30) , 

1142 

1  Y (5002) , STAT (5002) , CRIT(6,8, 7).slpm(7),bi (7), pet. 

1143 

1  KS1 <5000) ,CV1 <5000) , ADI <5000) 

1144 

double  precision  dseed 

1145 

c 

1146 

if  <npct  .eq.  1)  pet  *  .80 

1147 

if  (npet  .eq.  2)  pet  *  .85 

1148 

if  (npet  .eq.  3)  pet  ■  .90 

1149 

if  (npet  .eq.  4)  pet  *  .95 

1150 

if  (npet  .eq.  5)  pet  ■  .99 

1151 

c 

1152 

c 

**  Store  the  3  Sets  of  5000  Test  Stats  into  ID  Arrays:  ** 

1153 

c 

1154 

do  16  nent  ■  l.nst 

1155 

KSl(ncnt)  *  KS(ncnt,nsiz,nshp) 

1156 

ADI (nent)  *  AD(ncnt,nsiz,nshp) 

1157 

CVl(ncnt)  *  CVM < nent, nsiz.nshp) 

1158 

16 

continue 

1159 

c 

1160 

c 

**  Use  IMSL  Subroutine  to  Order  the  Test  Statistics:  ** 

1161 

c 

1162 

Call  VSRTA(KSl.nst) 

1163 

c 

print*. 'ORDERED  KS  STATISTICS  FROM  CRTVAL: ' 

1164 

c 

print*, ’n=’ ,n, ’  c=',c 

1165 

c 

do  2  jks  *  l,nst 

1166 

c 

print*. 'KS  STAT  =»' ,  KS1  ( jks) 

1167 

c 

2 

continue 

1168 

c 

1169 

Call  VSRTA( AD l.nst) 

1170 

c 

print*. 'ORDERED  AD  STATISTICS  FROM  CRTVAL: ’ 

1171 

c 

print*, 'n*' ,n, '  c='.c 

1172 

c 

do  4  jad  =>  l.nst 

1173 

c 

print*, 'AD  STAT  »’,ADl(jad) 

1174 

c 

4 

continue 

1175 

c 

1176 

Call  VSRTA(CV l.nst) 

1177 

c 

print*. 'ORDERED  CVM  STATISTICS  FROM  CRTVAL:' 

1178 

c 

print*, 'n®' ,n, ’  c=',c 

1179 

c 

do  6  jcv  *  l.nst 

1180 

c 

print*. 'CV  STAT  =’,CVl(jcv> 

1181 

c 

6 

continue 

1182 

c 

1183 

c 

— 

-  Begin  DO  Loop  20  to  Rotate  Through  KS,  AD,  and  CVM  - 

1184 

c 

1185 

do  20  ntest  *  1,3 

1186 

c 

1187 

c 

-  Begin  DO  Loop  30  for  5000  Data  Points  - 

1188 

c 

1189 

do  30  j  =  l.nst 

1190 

c 

1191 

if  (ntest  .eq.  1)  then 

A-28 


STAT(j)  «  KSl(j) 
else  if  (ntest  . eq.  2)  then 
STAT(j)  *  ADI ( j) 
else  if  (ntest  .eq.  3)  then 
STAT(j)  *  CVl(j) 
end  if 


continue 


-  End  DO  Loop  30  for  5000  Data  Points  - 

**  Extrapolate  Left  Endpoint  of  the  Test  Statistics:  <* 
if  (STAT(l)  .eq.  STAT<2))  then 


pr i nt * , ’ «**»«*$*$***«**********$***$«$’ 

print*, ’TWO  LEFT  ENDPOINT  STATS  EQUAL’ 
if  (ntest  .eq.  i)  print*, 'FOR  KS  TEST’ 
if  (ntest  .eq.  2)  print*, 'FOR  AD  TEST’ 
if  (ntest  .eq.  3)  print*, ’FOR  CVM  TEST’ 


print*, ’n*’ ,n, '  c*’,c,’  pct=’,pct 
print*,’ STAT (1>=’,STAT(1) 
print*, ’STAT (2) *’ ,STAT (2) 

print*,’  mmmmnmrammm’ 

print*,’  ' 


difO  -  STAT (3)  -  STAT(i) 

if  (difO  .eq.  0.0)  difO  *  .00001 
slpm(O)  «  (Y (3)  -  Y(l) )  /  difO 

else 

difO  *  STAT (2)  -  STAT (1) 
slpm(O)  ®  (Y (2)  -  Y{ 1 ) )  /  difO 
end  if 

bi (0)  «  Y(l)  -  slpm(O)  *  STAT ( 1 ) 

STAT (0)  ®  max  <  0.0,  -  bi (0) /slpm(O)  ) 
print*,’  ’ 

if  (ntest  .eq.  1 >print*, ’FOR  KS  TEST  STATISTICS’ 
if  (ntest  .eq.  2)print*, ’FOR  AD  TEST  STATISTICS’ 
if  (ntest  .eq.  3)print*. ’FOR  CVM  TEST  STATISTICS 
print*, ’LEFT  ENDPT  X(0000)  =’,STAT(0) 

print*,’ - FIRST  X(0001>  *’, STAT(l) 

print*, ’80PCT  STAT  X(4000)  STAT (4000) 
print*, ’85PCT  STAT  X(4250>  ■' ,STAT(4250) 
print*, ’90PCT  STAT  X(4500>  =’, STAT (4300) 
print*. ’95PCT  STAT  X(4750)  =' ,STAT(4750) 
print*, '99PCT  STAT  X(4950)  STAT (4950) 
print*,’ -  LAST  X(5000)  *’. STAT < 5000 ) 


**  Extrapolate  Right  Endpoint  of  the  Test  Statistic:  ** 


if  <STAT  (nst-1 )  .eq.  STAT  (nst) )  then 


print*, ’ *****************«************’ 

print*, 'TWO  RIGHT  ENDPOINT  STATS  EQUAL:’ 

if  (ntest  .eq.  1)  print*, ’FOR  KS  TEST’ 

if  (ntest  .eq.  2)  print*, ’FOR  AD  TEST’ 

if  (ntest  .eq.  3)  print*, ’FOR  CVM  TEST 

print*. *n=’ ,n, ’  c=',c,’  pct»’,pct 
print*, ’STAT (4999)=’ , STAT (nst-1) 
print*, ’ STAT (5000)=’. STAT (nst) 
print*,  ’  mmmmmmxmxmmx’ 
print*,’  ’ 

di f 6  ■  STAT (nst)  -  STAT(nst-2) 

if  (dif &  .eq.  0.0)  dif6  »  .00001 
slpm(6)  *  (Y(nst)-Y(nst-2) )  /  dif6 
else 

dif6  =  STAT (nst)  -  STAT(nst-l) 
slp*<6>  ■  <Y (nst) -Y (nst-1 ) )  /  dif6 
end  if 

bi (6)  *  Y(nst-l)  -  slpm(6) *STAT(nst-l) 
STAT(nst-*'l)  =  (1.0  -  bi<6>>  /  slpm(6) 
print*. ’RGHT  ENDPT  X(3001)  =’ , STAT (nst+1 > 

**  Interpolate  Critical  Values  Between  Test  Stats 

—  Begin  DO  Loop  50  to  Find  Max  Y(k)  <  pet:  — 

do  50  kj  *  l.nst 
k  =  nst+1  -  kj 

if  (Y (k)  ,le.  pet)  then 

if  (STAT(k)  .eq.  STAT(k+l ) )  then 


print*, ’ **********»***$$*S$$*SS$*$«$$«’ 

print*, ’TWO  ADJACENT  STATS  EQUAL:’ 

if  (ntest  .eq.  1)  print*, ’FOR  KS  TEST’ 
if  (ntest  .eq.  2)  print*. ’FOR  AD  TEST’ 
if  (ntest  .eq.  3)  print*, ’FOR  CVM  TEST 
print*, ’n*’ ,n, ’  c=’,c,’  pct=’,pct 
print*. ’STAT (k) *’ , STAT(k) 
print*, ’STAT  <k+l) ■’ ,STAT  <k+l) 

print*,  ’  mmmxxxmmxummn):’ 

print*,’  ’ 

dif  =  STAT (k+1)  -  STAT(k-i) 

if  (dif  .eq.  0.0)  dif  =  .00001 
slpm(npct)  »  (Y(k+1 ) -Y (k-1 ) )  /  dif 

else 

dif  =  STAT (k+1 )  -  STAT (k) 


A— 30 


1296 

slpm(npct)  ■  (Y (k+1 ) -Y(k) )  /  dif 

1297 

end  if 

1298 

c 

1299 

bi (npct)  *  Y(k)  -  slpm(npct)  *  STAT(k) 

1300 

CRIT (nsiz ,nshp,npct> 

1301 

1  =  (pct-bi (npct) ) /slpm(npct) 

1302 

SOTO  75 

1303 

c 

1304 

end  if 

1305 

c 

1306 

50 

continue 

1307 

c 

1308 

c 

—  End  DO  Loop  50  Upon  Finding  Crit  Val  — 

1309 

c 

1310 

c 

**  Associate  the  Critical  Values  with  Test  Type:  ** 

1311 

c 

1312 

75 

if  (ntest  .eq.  1)  then 

1313 

KScrit (nsiz, nshp, npct)  *  CRIT(n5iz,nshp,npct) 

1314 

c 

print*, 'n®' ,n, ’  **  c=',c,’  pct=* ,pct 

1315 

c 

print*, 'CRTVAL  KS  Crit  Val  ■’ ,KScrit (nsiz,nshp,npct) 

1316 

else  if  (ntest  .eq.  2)  then 

1317 

ADcrit (nsiz,nshp,npct)  *  CRIT<nsiz,nshp,npct) 

1318 

c 

print*, 'CRTVAL  AD  Crit  Val  ADcrit (nsiz,nshp, npct) 

1319 

else  if  (ntest  .eq.  3)  then 

1320 

CVcrit(nsiz,nshp,npct)  =  CRIT(nsiz,nshp,npct) 

1321 

c 

print*, 'CRTVAL  CV  Crit  Val  *' ,CVcrit (nsiz,nshp,npct> 

1322 

c 

print*,’  ’ 

1323 

end  if 

1324 

c 

1325 

20 

continue 

1326 

c 

1327 

c 

— 

End  DO  Loop  20  After  Rotating  Through  KS,  AD,  and  CVM  - 

1328 

c 

1329  return 

1 330  end 

1331  c 

1332  C**=MMM=»“*===a‘*=MM====B*=M===3=M=*=*>*M=«=!a*====r*===:====I==>= 

1333  c  END  SUBROUTINE  CRTVAL 

1334  c  ***************************************************************** 


Computer 

Program 

POWER 


Subroutines 
PARETO  GGWIB 
GGAMR  GGBTR 
GGEXN  GGNML 


Subroutines 
VSRTA  BLCGT2 
BXVALS  HYPCDF 
BLCLE2  TESTAT 


Subroutine 

COMPAR 


Main  Program 
DO  Loop  40 


Fig  7.  Procedure  for  Determining  Power  Values 


m 


Main  Program 


STEP  5 

Divide  Number  of  Hq 
Rejects  by  5000  to 
Determine  Power  o-f  Test 


Main  Program 
DO  Loop  60 


/  STEP  o\ 
"Repeat  for 
\Distribu? 
\tions  / 


Main  Program 
DO  Loop  70 


/STEP  7\^ 
Repeat  for 
i*  5,13,2: 


Main  Program 
DO  Loop  80 


/STEP  S\ 
"Repeat  for 
\a  *.05,  .01 


Main  Program 
DO  Loop  90 


/STEP  9\ 
Repeat  f  or 
:»  1.0,3.! 


_ _ 

STOP  jj) 


Fig  7  (Continued) 


Procedure  for  Determining  Power  Values 


1 

•  ■  2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

%•  27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 


c>****  Classroom  Support  Computer  (CSC)  -  VAX  11/785  -  VMS  4.1  **** 
c 

c ******  POWER  PROGRAM  FOR  PARETO  SOQDNESS-OF-F I T  TESTS  ****** 


c****************************************************************** 
c*** ***********************************************  **************** 
c**  ** 

c**  BEGIN  POWER  MAIN  PROGRAM  ** 

c**  ** 

c****************************************************************** 
c 

c  Ref:  Appendix  B,  Figure  7. 
c 

c 

Purpose:  Test  the  null  hypothesis  that  a  set  of  sample  data 
follows  the  Pareto  distribution  with  hypothesised  shape  c 
against  the  alternate  hypothesis  that  the  data  follow  some 
other  distribution.  The  goals  are  to: 


c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

c 

c  Variables: 
c  dseed 

c  alpha 

c  n 

c  c 

c  nshp 

c  nal  f 

c  nsiz 

c  nalt 

c  nrep 

c  it 

c  KS 

c  CVM 

c  AO 

c  X2 


1.  Compare  powers  of  the  modified  Kolmogorov-Smirnov  (K-S) , 
Anderson-Darling  <A-D> ,  and  Cramer-von  Mises  (C-VM)  tests 
against  the  Chi-Square  test  to  determine  which  test  can 
best  detect  a  false  Pareto  distribution  hypothesis. 

2.  When  the  Pareto  null  hypothesis  is  true,  confirm  that 
the  hypothesis  rejection  rates  under  the  modified  K-S,  A-D, 
and  C-VM  statistics  are  low  enough  to  satisfy  a  claimed  level 
of  significance. 

3.  Provide  extensive  commentary  to  assist  novice  programmers 
to  conduct  similar  power  studies  in  statistical  analysis. 
Diagnostic  print  statements  have  been  retained  as  commentary 
to  contribute  to  this  goal. 


random  number  seed 

level  of  significance  (.01  or  .05  used  here) 
sample  size 

null -hypothesis  Pareto  shape  parameter 
null-hyp  Pareto  shape  counter  <l:c=1.0,  2:c=3.5) 
significance  level  counter  (li  o*.05,  2:  a  =.01) 
sample  size  counter  (l:n=5,  2: n=15,  3:n=25) 
alternative  distribution  counter  (8  in  all) 
number  of  repetitions  to  be  used 
iteration  counter  (5000  repetitions  required) 
array  of  values  of  modified  K-S  test  statistic 
array  of  values  of  modified  C-VM  test  statistic 
array  of  values  of  modified  A-D  test  statistic 
array  of  values  of  Chi-square  test  statistic 


l 


f 


A 

i'  * 
i  '  A 


!» _ 


B-4 


nrKS  =  number  of  hypothesis  rejects  under  the  K-S  test 
nrAD  *  number  of  hypothesis  rejects  under  the  A-D  test 
nrCV  =  number  of  hypothesis  rejects  under  the  CVM  test 
nrX2  =  number  of  hypothesis  rejects  under  Chi-square 


:ssss=sssassssssasas=sssssssssaas3sssi83acs»a3BmBS8Ssss33xBsi3a 

Input: 

nrep  »  number  of  repetitions  (input  at  computer  terminal) 
dseed  *  random  number  seed  (input  at  computer  terminal) 

sssssasssassssssssasaasasssassnsssHinussasaMiasuaaasiasann 

Subroutines: 

PARETO  -  Generates  n  random  Pareto  deviates 

BXVALS  -  Calculates  B  values  and  summations  of  B  and  Bx 

BLCLE2  -  Finds  BLUEs  for  location  and  scale  when  c  <=  2 

BLCGT2  -  Finds  BLUEs  for  location  and  scale  when  c  >  2 

HYPCDF  -  Computes  the  Hypothesized  Pareto  CDF 

TESTAT  -  Calculates  the  K-S,  A-D,  and  C-VM  test  statistics 

COMPAR  -  Compares  test  stats  vs.  cr it  vals  and  counts  rejects 


IMSL  Subroutines: 

GGWIB  -  Generates  random  Wei  bull  deviates 
GGAMR  -  Generates  random  Gamma  deviates 
GGBTR  -  Generates  random  Beta  deviates 
GGEXN  -  Generates  random  Exponential  Deviates 
GGNML  -  Generates  random  Normal  Deviates 
VSRTA  -  Arranges  data  in  ascending  order 


mi 


Output: 


KSpwr (nshp, naif ,nsiz,nalt) 
ADpwr (nshp, naif ,nsiz, nalt) 
CVpwr (nshp, naif ,nsiz,nalt) 
X2pwr (nshp, naif ,nsiz, nalt) 


power  values  for  K-S  test 
power  values  for  A-D  test 
power  values  for  C-VM  test 
power  values  for  Chi-square 


L.J 


Declare  Variables: 

common  dseed, x,n,c,nc, B,D, ablu,bblu,P,Bsuml, Bxsuml, 

1  Bxsum2, Bxsm2c , KS , AD, CVM, i t , nsi z , nshp , nrep , 

1  nal t , nal f , nrKS, nrAD, nrCV, nr X2,  X2 

integer  n, nsi z , nshp, it , nrep, nrKS (2, 2, 3,8) .nrAD (2, 2, 3, 8) , 
1  nrCV (2, 2, 3, 8) ,  nrX2  (2, 2, 3, 8) 


w>VV 


C'M 


real 


x (25) , ablu, bblu, B (25) , D, KS (2, 2, 3, 8) , AD (2, 2, 3, S) , 
CVM <2,2,3,8),c,nc, Bsum 1 , Bx  suml , Bx  sum2 , Bx  sm2c , 
P(25) ,r (25) , alpha, KSpwr (2, 2, 3, 8) , ADpwr (2. 2,3,8) , 
CVpwr (2,2, 3, 8) , X2crit (2,2,3) , X2 (2, 2,3, 8) , 

X2pwr (2, 2, 3, 8) 
character  test (4) *3,altcdf (8) *12 
double  precisian  dseed 

test ( 1 )  »  'K-S' 
test (2)  «  'A-D' 
test (3)  »  ’CVM* 
test (4)  *  'CHI' 

altcdf (1)  *  'Pareto  c=1.0’ 
altcdf (2)  *  'Pareto  c=3.5’ 
altcd-f  (3)  “  'Pareto  c*2.0' 
altcdf<4)  a  'Meibull' 
altcdf (5)  a  ’Gamma’ 
altcdf (6)  *  ’Beta’ 
altcd-f  (7)  *  'Exponential' 
altcd-f  (8)  «  'Normal' 

**  Open  Output  File  to  Store  Computed  Power  Values:  ** 
open  <unita7,f ile*’X2ALL’ .status*’ new’ ) 

**  Number  of  Repetitions  to  be  Used  on  Each  Run:  ** 

print*, 'The  Monte  Carlo  power  analysis  will  require 
print*,’  5000  repetitions.’ 

print*, 'Enter  the  number  to  be  used  for  this  run:’ 
read*,nrep 

print*, 'Enter  random  number  seed  or  "l."  for  default:’ 
read*, dseed 

if  (dseed  .eq.  1.)  dseed  *  1 23457. dOO 
print*,’  ’ 

print*, 'STANDBY  .  .  .  COMPUTATIONS  IN  PROGRESS’ 

Begin  DO  Loop  90  for  Null -Hypothesis  Pareto  Shape  c  - 

do  90  nshp  =  1,2 


if  (nshp  .eq.  1)  then 
c  *  1.0 
write(7,51) 
write(7,56) 
write(7,5B) 
write<7,62) 

else  if  (nshp  .eq.  2)  then 
c  *  3.5 
write(7,52) 
write<7,56> 
write(7,59) 


157 

isa 

159 

160 
161 
162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 

180 
181 
182 

183 

184 

185 

186 

187 

188 

189 

190 

191 

192 

193 

194 

195 

196 

197 

198 

199 

200 
201 
202 

203 

204 

205 

206 

207 

208 


write(7,62) 
end  if 
c 

c  -  Begin  DO  Loop  80  for  Alpha  Significance  Levels  - 

c 

do  80  naif  =  1,2 
c 

if  (naif  .eq.  1)  then 
alpha  9  .05 
write(7,64) 

else  if  (naif  .eq.  2)  then 
alpha  =  .01 
write(7,66) 
end  if 
c 

write(7,54) 
write (7, 74) 
write(7, 68) 
wite(7,72) 
write(7, 76) 
write(7,72) 
c 

nsiz  9  0 
c 

c  print*,’  999999999«**»9999*9»**9=“99 ' 

c  print*, ’Numbers  of  Rejects  After  do  80/Before  do  70’ 

c  print*, ’c  9’,c, ’alpha  a’ , alpha, ’n*' ,n, ’CDF:  ’ ,altcdf (nalt) 

c  print*, ’KS  Rejects  9  ’ ,nrKS(nshp, naif , nsiz, nalt) 

c  print*, ’AD  Rejects  9  ’, nr AD (nshp, naif , nsiz, nalt) 

c  print*, ’CV  Rejects  9  ’ ,nrCV(nshp, naif , nsiz, nalt) 

C  print*,  '  sssasasasasssaannssnnn’ 

C 

c  -  Begin  DO  Loop  70  for  Sample  Sizes  - 

c 

do  70  n  9  5,25,10 
c 

nsiz  9  nsiz  +  1 
c 

nc  9  n  *  c 
c 

c  —  Begin  DO  Loop  60  for  Alternate  CD Fs  — 

c 

do  60  nalt  9  1,8 
c 
c 

nrKS(nshp, naif , nsiz, nalt)  =  0 
nrAD(nshp, naif , nsiz, nalt)  9  0 
nrCV(nshp, naif , nsiz, nalt)  9  0 
nrX2(nshp, naif , nsiz, nalt)  9  0 
c 

c  —  Begin  DO  Loop  40  for  Repetitions  — 

c 


B— 7 


do  40  it  a  l.nrep 


**  Perform  Stop  1  of  Figure  7:  ** 


if 

(nalt  .eq. 

1) 

cal  1 

if 

(nalt  .eq. 

2) 

call 

if 

(nalt  .eq. 

3) 

call 

if 

(nalt  .eq. 

4) 

call 

if 

(nalt  .eq. 

5) 

call 

if 

(nalt  .eq. 

6) 

call 

if 

(nalt  .eq. 

7) 

call 

if 

(nalt  .eq. 

S) 

call 

PARETO 

PARETO 

PARETO 

GGWIB(dseed,3.5,n,x) 
G6AMR (dseed, 2. , n, l,x) 
GGBTR (dseed, 2. ,3. ,n,x) 
GGEXN(dseed,2. ,n,x) 
6GNML (dseed, n,x) 


**  Perform  Step  2  of  Figure  7:  ** 

call  VSRTA(x,n) 
call  BXVALS 

if  (c  .eq.  1.0)  call  BLCLE2 
if  (c  .eq.  3.5)  call  BLCGT2 


call  HYPCDF 
call  TESTAT 


**  Perform  Step  3  of  Figure  7:  ** 
call  COMPAR 


continue 

—  End  DO  Loop  40  for  Repetitions 

**  Completes  Step  4  of  Figure  7  ** 

**  Perform  Step  5  of  Figure  7:  ** 

print*,  ’  ===*===3===*================’ 

print*, ’Numbers  of  Rejects  Prior  to  Power  Calculation 
print*, ’c  =’,c, ’alpha  =’, alpha, ’n*’ ,n, ’nalt=’ ,nalt 
print*, ’KS  Rejects  *  ' ,nrKS(nshp,nalf ,nsiz,nalt) 
print*, ’AD  Rejects  =  ’ ,nrAD(nshp,nalf ,nsiz,nalt) 
print*, 'CV  Rejects  *  ’ ,nrCV(nshp,nalf ,nsiz,nalt) 
print*, ’X2  Rejects  =  ’ ,nrX2(nshp,nalf ,nsiz,nalt) 
print*,  ’ 


KSpwr (nshp,nalf ,nsiz,nalt)  = 
nrKS(nshp,nalf ,nsiz,nalt) /real (nrep) 

ADpwr (nshp,nalf ,nsiz,nalt)  - 
nrAD(nshp,nalf ,nsiz,nalt) /real (nrep) 

CVpwr (nshp.nalf ,nsiz,nalt)  = 
nrCV(nshp,nalf ,nsiz,nalt) /real (nrep) 


261 

262 

263 

264 

265 

266 
267 
26B 

269 

270 

271 

272 

273 

274 

275 

276 

277 

278 

279 

280 
281 
282 

283 

284 

285 

286 

287 

288 

289 

290 

291 

292 

293 

294 

295 

296 

297 

298 

299 

300 

301 

302 

303 

304 

305 

306 

307 

308 

309 

310 

311 

312 


c 

60 

c 

c 

c 

c 

c 

c 

1 

1 

1 

1 

c 

1 

1 

t 

1 

c 

1 

1 

1 

1 

c 

1 

1 

1 

1 

c 


X2pwr  (nshp, naif  , nsiz, nalt)  =* 

nr X2 (nshp, naif , nsiz, nalt) /real (nrep) 

print*,  ’  mm***********************************’ 
print*,'  POWER  VALUES  FROM  MAIN  PROSRAM' 
print*,’  Null-hyp  c  =’,c,’ alpha  *’ .alpha 
print*,’  n=’,n,’  Alternate  CDFs  ’ .altcdf (nalt) 

print*,  ’  a*aiKassss5s*8tsssiis»sMs' 

print*,’  KS  Rejects  *  ’ ,nrKS(nshp, naif, nsiz, nalt) 
print*,’  AD  Rejects  *  ’ , nr AD (nshp, naif .nsiz.nalt) 
print*,’  CV  Rejects  a  ’ ,nrCV(nshp, naif, nsiz.nalt) 
print*,’  X2  Rejects  =  ’, nr X2(nshp, naif , nsiz.nalt) 

print*,  ’  aaaaaaaaaaaaasasmaaasaaaaa ’ 

print*,’  KS  Power  KSpwr (nshp, naif, nsiz, nalt) 
print*,’  AD  Power  *’, ADpwr (nshp, naif , nsiz, nalt) 
print*,’  CV  Power  *’ ,CVpwr (nshp, naif, nsiz, nalt) 
print*,’  X2  Power  =' ,X2pwr (nshp, naif .nsiz.nalt) 
print*, ’******************************************’ 
print*,’  ’ 

continue 

—  End  DO  Loop  60  for  Alternate  CDFs  — 

**  Completes  Step  6  of  Figure  7  *< 

Write  Power  Results  to  File 

write (7, 110) , n.test (1) .KSpwr (nshp, naif, nsiz, 1) , 

KSpwr (nshp, naif, nsiz, 2) , KSpwr (nshp, naif, nsiz, 3) , 
KSpwr (nshp, naif ,nsiz, 4) , KSpwr (nshp, naif , nsiz , 5) , 
KSpwr (nshp, naif, nsiz, 6) , KSpwr (nshp, naif, nsiz, 7) , 
KSpwr ( nshp , nal f , nsi z , 8 ) 

write (7, 110) ,n, test (2) , ADpwr (nshp, naif, nsiz, 1) , 

ADpwr (nshp , naif , nsiz , 2) , ADpwr (nshp , nal f , nsiz , 3) , 
ADpwr (nshp, naif ,nsiz, 4) .ADpwr (nshp, naif .nsiz, 5) , 
ADpwr (nshp, naif , nsiz, 6) , ADpwr (nshp, naif, nsiz, 7) , 
ADpwr (nshp, nal f , nsi z , 8) 

write(7, 110) , n.test (3) ,CVpwr (nshp, naif,  nsiz, 1) , 

CVpwr (nshp, naif , nsiz, 2) .CVpwr (nshp, naif , nsiz, 3) , 
CVpwr (nshp, nal f, nsiz, 4) , CVpwr (nshp, naif ,nsiz, 5) , 
CVpwr (nshp, naif , nsiz, 6) , CVpwr (nshp, naif , nsiz, 7) , 
CVpwr (nshp , nal f , nsi z , 8) 

write(7, 110) ,n,test(4) , X2pwr (nshp, naif ,  nsiz,  1) , 

X2pwr (nshp, naif , nsiz, 2) , X2pwr (nshp, nal f , nsiz ,3) , 
X2pwr (nshp, naif, nsiz, 4) , X2pwr (nshp, naif , nsiz, 5) , 
X2pwr (nshp, naif ,nsiz , 6) , X2pwr (nshp, naif , nsi z , 7) , 
X2pwr (nshp, nal f , nsi z , 8) 


c 


write<7,72) 


continue 


-  End  DO  Loop  70  ■for  Sample  Sizes  - 

II  Completes  Step  7  of  Figure  7  II 

continue 

End  DO  Loop  SO  for  Alpha  Significance  Levels  — - 
It  Completes  Step  8  of  Figure  7  It 


'  •  * 


write<7, 74) 


90  continue 
c 

c  -  End  DO  Loop  90  for  Null -Hypothesis  Pareto  Shape  Parameter  - 

c 

cttlttttttttttttttlttttttttttttttttttttttttttttttltttlltttttttttttt 

c 

c  Specify  Format  for  Hardcopy  Output  Data  and  Headers: 

c 

51  format!’ 1\36X, ’Table  XVII’) 

52  format (’ 1’ ,  35X, ’Table  XVIII’) 

54  format!’  ’) 

56  f or mat (’ O’, 22X,’ POWER  TEST  FOR  THE  PARETO  DISTRIBUTION’) 

58  -format  <22X,  ’Hot  Pareto  Distribution  at  Shape  c  *  1.0’) 

59  format (22X, ’Ho:  Pareto  Distribution  at  Shape  c  *  3.5’) 

62  format (22X, ’Ha:  The  data  follow  another  distribution’) 

64  format (’ 0’ ,28X, ’Level  of  Significance  =  .05’) 

66  format < ’0’ ,28X, ’Level  of  Significance  *  .01’) 

68  format (35X, ’Alternate  Distributions’) 

72  format <80 (’-’)) 

74  format  <80 < ’ »’ ) ) 

76  format <2X, ’  n’ ,3X, ’Test’ ,4X, ’Par. 1’ ,3X, ’Par. 2’ ,3X. ’Par. 3’ ,3X, 
1  ’Weibl’,3X, ’ Gamma’, 3X, ’Beta’ ,4X, 

1  ’Expan’ ,3X, ’Nor ml ’ ) 

110  format ( ’  ’ , 13, A7,F9.3,7FB.3> 
c 

close<7) 


L  J 


Cj 


tz a 


c  END  MAIN  PROGRAM 

Cllllllllllllllllllllllllllllltl4imillllll||lllll*||*l4lllllll4ll 


B— 10 


Subroutine  PARETO 

ctt  tt 

ctt  BEGIN  SUBROUTINE  PARETO  ** 

ctt  *1 

ctttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt 
c 

c  Ref:  Appendix  B.  Fig  7,  Step  1. 


c  Purpose:  For  a  specified  sample  sice  n,  generate  n  random 
c  deviates  from  a  Pareto  distribution  with  parameters  of 

c  location,  scale,  and  shape  set  to  specified  positive 


c  variables: 


r  =  arrav  containing  n  random  numbers 
ac  =  actual  shaoe  parameter  of  Pareto  deviates 
x  ■  arrav  containing  n  Pareto  deviates 
n  =  sample  size 
dseed  =  random  number  seed 


c  Input:  dseed  *  random  number  seed  (from  MAIN  program) 

c  n  =  samole  size  *  5.15,  or  25  (MAIN  DO  Loop  70) 

c  nalt  *  alternate  CDF  counter  (MAIN  DO  Loop  60) 


c  IMSL  Subroutines: 
c 

c  GGUBFS  -  generates  random  numbers  distrib  uniformly  on  (0.1) 
c  VSRTA  -  arranges  a  set  of  numbers  in  ascendino  order 


c  Calculate: 


c  ;<(•)  3  (l/r(j))  tt  (1/ac)  for  j  =  l,2,...,n  (from  eon  48) 
c 

c  :<  ( a  ’ .  b  ’  >  =  b'  *  (  (  x(a,b)  -  a  )  /  b)  +  a’  (from  eon  50) 


c  Output 


arrav  of  n  random  Pareto  deviates 


Declare  Variables: 


common  dseed, ::  ,n,  c,  nc,  B,  D.  abl  u,  bblu,  P,  Bsuml ,  Bxsuml , 

1  Bxsum2.Bxsm2c.KS, AD. CVM, it.nsiz.nsho.nrep, 

1  nal  t.  naif ,  nrKS,  nr  AD.  nrCV,  nrX2.  X2 

integer  n.nsis.nsho.it.nreD,nrKS<2.2.3.8> ,nrAD<2.2.3,8> 
1  nrCV(2,2.3,3) 

real  >;  (25)  ,ablu.bblu.B<25)  ,D.KS(2.2.3.B)  ,AD(2,2.3,8) 

1  CVM  ( 2 , 2 , 3 , 8 ) , c . nc , Bsum 1 , Bx  sum 1 ,  Bx  sum2 .  Bx  sm2c , 

1  P(25) ,r (25) .alpha. KSowr (2. 2, 3, 8) .ADpwr (2.2.3.S) 

1  CVpwr  (2, 2. 3, 8) .  ac 

double  precision  dseed 

if  (nalt  .ea.  1)  ac  *  1.0 

if  (nalt  .ea.  2)  ac  *  3.5 

if  (nalt  .ea.  3)  ac  *  2.0 

-  Begin  DO  Loop  10  to  Generate  n  Random  Pareto  Deviates  - 

do  10  j  =  l.n 

:  Use  IMSL  subroutine  to  generate  random  numbers: 

r  ( j  >  =  GGUBFS (dseed) 

Use  eqn  48  to  transform  them  to  Pareto  deviates 
with  location  a  *  1  and  scale  b  *  1: 
x(j)  =  (1.0/r ( j) >  t* (1.0/ac) 

Use  eon  50  to  transform  to  Pareto  deviates  with 
a  =  2,  b  =  3  for  the  second  alternate  CDF: 
if  (nalt  .eq.  2)  :<(_i)  =3.  *  x(j)  -  1. 

c  Use  eqn  50  to  transform  to  Pareto  deviates  with 

■  a  =  10,  b  =  5  for  the  third  alternate  CDF: 

if  (nalt  .ea.  3)  x(j)  =  5.  *  x(j)  +  5. 
c 

10  continue 
c 

c -  End  DO  Loan  10  after  Generating  n  Random  Deviates  - 

c 

return 


c  END  SUBROUTINE  PARETO 

cmmmtmmmmtmmtmmmmttmmmtmtmm** 


Subroutine  BXVALS 

cmmtmmmmmmtmmmmtmtmmtmmmtmm 

ctt  ** 

c**  BEGIN  SUBROUTINE  BXVALS  t* 

cl*  ** 

c 


c  Ref:  Appendix  B,  Fig.  7,  Step  2. 


Purpose:  For  a  given  sample  size  n.  calculate  the  5  values 

used  to  find  the  BLUEs  of  location  and  scale.  Also 
find  the  sum  of  the  first  n-1  values  of  B(i).  Then, 
compute  the  three  values  eaual  to  the  sums  of  the 
first  n-1,  the  first  n-2,  and  (for  hypothesised 
c  =  .5,  1.  or  2)  the  first  n  -2/c  values  of  B(i)x(i>, 


Variables:  c  =  nul 1 -hypothesis  shape  parameter 

n  =  sample  size 

x  *  array  containing  n  ordered  deviates 
from  an  alternate  distribution 
B  *  array  containing  n  values  of  B 

Bsuml  =  sum  of  B(i>  values  for  i  *  1,2 . (n-1) 

Bxsuml  *  sum  of  B(i)x(i)  for  i  =  1,2 . (n-1) 

Bxsum2  *  sum  of  B(i)x(i>  for  i  *  1,2 . (n-2) 

B::sm2c  *  sum  of  B(i)x(i)  for  i  *  1,2, , . . ,  (n-2/c) 


c  Input: 


c  =  null-hyp  shaoe  parameter  (from  MAIN  DO  Loop  90) 
n  =  sample  size  =  5,  15.  or  25  (from  MAIN  DO  Loop  70) 
nc  =  n*c  (from  MAIN  program) 
x  =  ordered  deviates  of  alternate  CDF  MAIN) 


c  Calculate: 


B(i)  =  Cl  -  2/c(n-i+l) 3  *  B(i-l) 


Bsuml  *  Bd)  +  B < 2)  +  ...  +  B(n-l) 


(eon  29) 


Bxsuml  =  B(l)*x(l)  +  ...  +  B(n-l)*x (n-1) 

B::sum2  =  B(l)*x(l)  +  ...  +  B  (n-2)  *x  (n-2) 

Bxsm2c  =  B(l)*x(l)  +  ...  +  B(n-2/c)*x (n-2/c) 


c 

c 

Output: 

B 

- 

array 

containing  n  values  of  B 

c 

Bsuml 

= 

sum 

of 

first 

(n-1)  B  values 

c 

Bx suml 

= 

sum 

of 

first 

(n-1)  B*x  values 

c 

Bxsum2 

= 

sum 

of 

first 

(n-2)  Btx  values 

c 

Bxsm2c 

St 

sum 

of 

first 

(n-2/c)  Btx  (if  2/c  is  integer) 

c  Declare  variables: 
c 

common  dseed,  x ,  n,  c,  nc,  B,  D,  ablu,  bblu,  P,  Bsuml ,  B:<  sum  1 , 

1  B>:sum2,  Bxsm2c ,  KS,  AD,  CVM,  i  t,  nsiz ,  nshp,  nrep, 

1  nal t, nal f , nrKS, nr AD, nrCV, nrX2,  X2 

integer  n.nsis.nshp, it, nrep, nrKS (2, 2, 3, 8) ,nrAD(2,2,3,B) 
1  nrCV<2.2,3.8) 

real  >;  (25)  .ablu. bblu. B<25)  .D.KS(2.2.3.8)  ,AD(2.2.3,8) 

1  CVM (2, 2. 3, 8) ,  c,nc,  Bsuml,  Bx sum  1.  Bxsum2,  B:<sm2c, 

1  P(25),r (25) . alpha, KSpwr (2, 2, 3. 8) ,ADpwr (2, 2, 3, 8) 

1  CVpwr (2,2,3,8) 

double  precision  dseed 
c 

c  Calculate  the  first  B  value  (eqn  25): 
c 

B < 1 )  »  1.0  -  2.0/nc 
c 

c  -  Begin  DO  Loop  10  to  Find  the  2nd  thru  nth  B  values  - 

c 

do  10  j  a  2,n 

B(j)  =  B(j-l)  *  <1.0  -  (2.0/(c< (n-j+1) ) ) ) 

10  continue 
c 

c  -  End  DO  Loop  10  - 

c 

Bsuml  *  0 
c 

c  -  Begin  DO  Loop  20  to  Sum  the  First  n-1  Values  of  B  - 

c 

do  20  k=l, (n-1) 

Bsuml  =  Bsuml  +  B(k) 

20  continue 

c 

c  -  End  DO  Looo  20  - 

c 

Bxsuml  =  0 
c 

c  -  Begin  DO  Loop  30  to  Sum  the  First  n-1  Values  of  Bx  - 

c 

do  30  1=1, (n-1) 

Bxsuml  =  Bxsuml  +  (B(l)*x(l)> 

30  continue 


—  End  DO  Lood  30  - 

Bxsuin2  *  Bxsuml  -  <B(n-l ) *x (n-1) > 

Find  Bxsm2c  When  2/c  is  an  Integer  (c=.5,  1,  or  2) - 

Bxsm2c  =  0 

i-f  (c  .  ea.  1.0)  then 
Bxsm2c  =  B>:suffl2 
else  i-f  (c  .ea.  2.0)  then 
Bxsm2c  *  Bxsuffll 
else  i-f  (c  .ea.  0.5)  then 

Bxsm2c  *  Bxsum2  -  <B <n-3> tx <n-3> >  -  (B (n-2) tx (n-2) ) 
end  i-f 

return 


c  END  SUBROUTINE  BXVALS 

cmtttmtttttttmtttmtttttttttttttt  tttttttttttttttttttttttttt 


584 

585 

586 

587 

588 

589 

590 

591 

592 

593 

594 

595 

596 

597 

598 
600 
601 
602 

603 

604 

605 

606 

607 

608 

609 

610 


613 

614 
616 
618 

619 

620 
621 
622 

623 

624 
626 
628 

629 

630 

633 

634 
636 

638 

639 

640 

641 

642 


Subroutine  BLCLE2 

cXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
cXX  XX 
c**  BEGIN  SUBROUTINE  BLCLE2  ** 
cXX  %* 
cXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 


c 

c 

c 

c» 

c 

c 

c 

c 

C-- 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c= 

c 

c 

c 

c 

c 

c 

c 

C3 

c 

c 

c 

c 

c 

c- 

c 

c 

c 

c 

c 


Ref:  Appendix  B,  Figure  7,  Step  2  (continued). 


Purpose:  Given  an  ordered  sample  of  size  n  and  null -hypothesis 
c<=2,  calculate  the  BLUEs  of  location  a  and  scale  b. 


Variables: 

x 
c 
n 
B 
nc 
Coef  1 
Coef  2 
Coef  3 
Bxsum2 
Bxsm2c 
ablu 
bblu 
U 

Termi 

ss*s=»=33«aaBS=ss: 


=  array  containing  n  ordered  deviates  from  a  CDF 
=  null -hypothesis  Pareto  shape  parameter 
=  sample  size 

=  array  of  B  values  used  to  calculate  the  BLUEs 
=*  product  of  n  and  c 

=  coefficient  used  to  compute  BLUE  of  location  a 

=  coefficient  used  to  compute  BLUE  of  location  a 

=  coefficient  used  to  compute  BLUE  of  scale  b 

=  sum  of  B(i)tx(i)  terms  for  i  =  l,...,n-2 

*  sum  of  B(i)4x(i)  terms  for  i  =  1 . n-2/c 

®  BLUE  of  the  location  parameter  a 

*  BLUE  of  the  scale  parameter  b 

=  value  used  to  compute  BLUEs  when  c  *  1.5 
=  terms  used  to  compute  U  (i=l,2.i3> 

:sSSSSSS3SSS2SSSS3S3rSSS=SSSS==3SSSSSSSSS=SS3S=S3SSS 


Input:  x  *  array  of  n  ordered  deviates  (from  MAIN  Program) 

c  *  null-hyp  shape  =*  1.0  (from  MAIN  DO  Loop  90) 
n  *  sample  size  =  5,  15,  or  25  (from  MAIN  DO  Loop  70) 
nc  *  n*c  (from  MAIN  program) 

B  *  array  containing  n  values  of  B  (from  BXVALS) 
Bxsum2  *  sum  of  first  n-2  values  of  B  (from  BXVALS) 

Bxsm2c  =  sum  of  first  n-2/c  values  of  B  (from  BXVALS) 


Calculate  (if  c  =  0.5,  1.0,  or  2.0): 


Coef 1  *  C (c+l)*(c+2) 3  /  C (nc-2) t (nc-c-2) 1 
Coef 2  =  (nc-2)  /  (c+2) 

ablu  =  x  < 1 »  -  Coef 1  X  CBxsm2c  -  <Coef2*x ( 1) ) 1  (eon  34) 
bblu  =  <nc-l>  *  Cx  <  1 )  -  ablu!  (eqn  35) 


Calculate  (if  c  =  1.5): 

Termi  =  'nc-2)  *  (nc-c-2) 
Term2  =  nc  t  (c-2)  *  B(n-l) 
Term3  *  (nc-1)  X  (c+2) 


8-16 


C(nc-l)/nc3  *  (nc-2-U) 
<Terml  -  Term2)  /  Term3 


(eqn  39) 


k<1)  -  bblu  /  (nc-1)  (eqn  37) 

(1/U)  *  C (c+1 ) I (Bxsum2)  +  <2c-l)*B(n-l)*x <n-l> 
-  Coef  3  I  x(l>3  (eon  38) 


652  c  OutDut: 

653  c  ablu  =  BLUE  of  location  parameter  a 

654  c  bblu  =  BLUE  of  scale  parameter  b 

657  c 


4# 


658 

659 

660 
661 
662 

663 

664 

665 

666 

667 

668 

669 

670 

671 

672 

673 

674 

675 

676 

677 


c  Declare  Variables: 
c 

common  dseed, x , n, c, nc, B, D, ablu, bblu, P, Bsuml , Bxsuml , 

1  Bxsum2,Bxsm2c.KS, AD.CVM, it,nsi2,nshp,nrep, 

1  nalt, nal f , nrKS, nrAD, nrCV, nrX2, X2 

integer  n,nsi2,nshp,it,nrep.nrKS(2,2,3,8) ,nrAD(2,2,3,8) , 
1  nrCV  (2, 2, 3, 8) 

real  x (25) , ablu. bblu, B<25) ,D,KS(2,2,3,8) ,AD<2,2,3,8) , 

1  CVM(2,2,3,8) ,c,nc, Bsuml, Bxsuml, Bxsum2,Bxsm2c, 

1  P(25> ,r (25) , alpha. KSpwr (2, 2. 3. 8) , ADpwr (2, 2, 3, 8) , 

1  CVpwr  (2,2, 3,8) ,  Terml ,  Term2,  Term3. Coef 1 , Coef 2, 

1  Coef 3,U 

double  precision  dseed 
c 

if  ((c-eq.0.5)  .or.  (c.eq.1.0)  .or.  (c.eq.2.0))  then 
Coef 1  *  ( <c+1.0)*(c+2.0) )  /  <(nc-2.0)*<nc-c-2.0>> 
Coef 2  *  (nc-2.0)  /  (c+2.0) 
ablu  «  >;<1)  -  Coefl  *  (Bxsm2c  -  (Coef  2<x  ( 1 ) ) ) 
bblu  =  (nc-1.0)  *  (x (1)  -  ablu) 


678  else  if  (c  .eq.  1.5)  then 

679  Terml  =  (nc-2.0)  *  (nc-c-2.0) 

680  Term2  =  nc  *  (c-2.0)  *  B(n-l) 

681  Term3  =  (nc-1.0)  $  (c+ 2.0) 

682  U  »  (Terml  -  Term2)  /  Term3 

683  Coef 3  *  < (nc-1 . 0) /nc)  *  (nc-2.0-U) 

684  bblu  =  (1.0/U)  *(  (c+1.0)  *  (Bxsum2> 

685  1  +  (2.0*c-1.0)*B(n-l)*x(n-l>  -  Coef3  *  x(l>  ) 

686  ablu  =  x(l)  -  (bblu  /  (nc-1.0)) 

687  c 

688  end  i f 

689  c 

690  return 

691  end 

692  c 

694  c  END  SUBROUTINE  BLCLE2 


695 


B-17 


Subroutine  8LCGT2 

cmmmmmmmmmmmtmtmmmmmtmmtmt 
ctt  it 
c**  BEGIN  SUBROUTINE  BLCGT2  ** 
ctt  » 
Cttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


Re*!  Appendix  B,  Figure  7,  Step  2  (continued). 


sasssasasaasBasssaaaaaaaascsasasBSBSsszassassasaaaassssssasss&ssB 


Purpose:  Given  an  ordered  sample  o-f  size  n  and  a  Pareto  null 
hypothesis  with  shape  c  >  2,  calculate  the  best 
linear  unbiased  estimates  (BLUEs)  of  location  and 
scale. 


Variables:  x 
c 
n 
nc 
B 

Bsuml 

Bxsuml 

D 

YV 

ablu 

bblu 


array  containing  n  ordered  deviates 
null -hypothesis  Pareto  shape  parameter 
sample  size 
product  of  n  and  c 

array  of  B  values  used  to  calculate  the  BLUEs 

sum  of  B<i)  terms  far  i  =  l,...,n-l 

sum  of  B(i)tx(i)  terms  far  i  =  l,...,n-l 

value  used  to  calculate  the  BLUEs 

value  used  to  calculate  the  BLUEs 

BLUE  for  location  parameter  a 

BLUE  far  scale  parameter  b 


3333a33aiS3=33S33SB=a333S33«33=a3S33333S=3333S3S3SSSS33S33SS3=33S3= 


Input:  x  =  array  of  ordered  deviates  (from  MAIN  Program) 

c  *  shape  parameter  =3.5  (from  T1AIN  DO  Loop  90) 
n  =  sample  size  «  5,  15,  or  25  (MAIN  DO  Loop  70) 
nc  =  ntc  (from  MAIN  Program) 

B  =  array  of  B  values  (from  BXVALS) 

Bsuml  =  sum  of  first  (n-1)  B  values  (from  BXVALS) 
Bxsuml  =  sum  of  first  n-1  B*x  values  (from  BXVALS) 


Calculate: 

D  »  t(c+l)  *  Bsuml!  +  C(c-l)  *  B(n)]  (eqn  21) 

YV  *  <c+l) tBxsuml  +  (c-l)*B(n)*x(n>  -  D*x(l)  (eqn  22) 

ablu  =  x(l)  -  YV/C (nc-1 ) $ (nc-2)  -  D*ncD  (eqn  17) 


bblu  =  (nc-1)  *  [  x(l)  -  ablu  ] 


(eqn  18) 


779 

780 

781 

782 

783 

784 

785 

786 

787 

788 

789 

790 

791 

792 

793 

794 

795 

796 

797 

798 

799 

800 
801 
802 

803 

804 

805 

806 

807 

808 

809 

810 
811 
812 

813 

814 

815 

816 

817 

818 

819 

820 
821 
822 

823 

824 

825 

826 

827 

828 

829 

830 


Subroutine  HYPCDF 

cttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttuttt 
ctt  ** 

ctt  BEGIN  SUBROUTINE  HYPCDF  ** 

ctt  tt 

Cttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt 
c 

c  Ref:  Appendix  B,  Figure  7,  Step  2  (continued), 
c 

CsranmisnmnnuHnasiaaunassaasnautuaxanaasaanaissi 

C 

c  Purposes  Given  an  ordered  sample  o-f  size  n,  a  Pareto  null-hyp 
c  of  shape  c,  and  the  BLUEs  of  location  a  and  scale  b, 

c  compute  the  hypothesized  Pareto  distribution 

c  function  P(i)  for  i  ■  1,2,... ,n. 

c 

CsasasaasasassaBsaasssacaBcmaaaaaaaaaxssssaaaasazsasasaasasaaaas 

C 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C= 

c 

c 

c 

c 

c 

c 

c 

c 

c- 

c 

c 

c 

c 

c 

c* 

c 

c 

c 

c* 

c 

c 

c 


Variables: 


x 

n 

c 

ablu 

bblu 

P 


aaassassaa 

Input: 

x 

c 

n 

ablu 

bblu 


array  containing  n  ordered  deviates 
sample  size 

null  hypothesized  Pareto  shape  parameter 
BLUE  of  location  a 
BLUE  of  scale  b 

array  containing  n  points  of  the 
hypothesized  Pareto  CDF 

laaaaaaasaaaaaaaaxaaaaaaxaaaaaaaaaaxaaaaasaaaa 


array  of  n  ordered  deviates  (from  MAIN  Program) 
null  hyp  shape  *  1.0  or  3.5  (MAIN  DO  Loop  90) 
sample  size  =  5,  15,  or  25  (from  MAIN  DO  Loop  70) 
BLUE  of  location  a  (from  BLCLE2  or  BLCGT2) 

BLUE  of  scale  b  (from  BLCLE2  or  BLCGT2) 


Calculate: 

P(i)  *  1  -  Cl  +  (x(i)  -  ablu) /bblu]  3  tt  <-c>  (eqn  40) 

iiassassiaflBaaBaaiBsaaacsaaBaaasssssassssasssssssasBSssassaiBSi 

Output:  P  »  array  of  n  points  of  the  hypothesized  CDF 


Declare  Variables: 

common  dseed,x,n,c,nc,B,D,ablu,bblu,P,Bsuml,Bxsuml, 
1  B>:sum2,Bxsm2c,KS,  AD.CVM,  it,nsiz,nshp,nrep, 


1 


nal  t ,  rial  f ,  nrKS,  nr  AD,  nrcv,  nrX2,  X2 
integer  n , nsi 2 , nshp , it,nrep,nrKS(2,2,3,8) ,nrAD(2,2,3,8) 
nrCV<2,2,3,8) 

real  x (25) ,ablu,bblu,B(25) ,D,KS(2,2,3,8> , AD (2, 2, 3, 8) 
CVM (2, 2, 3, 8) , c, nc , Bsuml , Bxsuml , Bxsum2, Bxsm2c, 

P (25) , r (25) , alpha, KSpwr (2, 2,3,8), ADpwr (2, 2, 3, 8) 
CVpwr <2,2,3,8) 
double  precision  dseed 

do  10  i  *  l,n 

P(i )  »  1.0  -  (1.0  +  (x(i>  -  ablu) /bblu)  **  <-c) 
continue 

return 


c  END  SUBROUTINE  HYPCDF 


Subroutine  TESTAT 

cmmmmmmmmtmmmmmmtmmmmmmm* 

cM  tt 

c**  BEGIN  SUBROUTINE  TESTAT  ** 

CM  M 

ctmntmmmmmmtiiummttmtmttmmumttmm 

c 

c  Ref:  Appendix  B,  Figure  7,  Step  2. 
c 

(IsssaazsssassssassBasssaBSasasssasssuBsa&aaaiss&SBSBsassaassasss 
C 

c  Purpose!  Given  a  sample  size  n,  and  the  hypothesized  Pareto 
c  distribution  function  P < i > ,  compute  values  of  the 

c  test  statistics  of  the  Chi-square  and  the  modified 

c  K-S,  A-D,  and  CVM  goodness-of-f it  tests. 


Variables: 


n  =  sample  size 

nshp  *  null-hyp  shape  counter  (1:  c=1.0,  2:  c=3.5) 
naif  =  alpha  level  counter  (l:a=.05,  2:  a  =.01) 
nsiz  =  sample  size  counter  (1:  n=5,  2 :  n=15,  3:  n=25) 
nalt  =  alternate  distribution  counter 

P  =  array  of  n  values  of  the  hypothesized  Pareto  CDF 

DP  ■  positive  differences  between  EDF  and  CDF  points 
DM  *  negative  differences  between  EDF  and  CDF  points 
DPLUS  =  maximum  positive  difference  (largest  DP  value) 
DMINUS  =  maximum  negative  difference  (largest  DM  value) 

KS  *  values  of  the  modified  K-S  test  statistic 

AL  =  value  used  to  calculate  the  A-D  test  statistic 
AM  =  value  used  to  calculate  the  A-D  test  statistic 
AN  =  AL  +  AM 

AAA  =  values  to  be  summed  for  A-D  test  statistic 
SAAA  =  sum  of  AAA  values 

AD  =  values  of  the  modified  A-D  test  statistic 

ACV  =  squared  quantities  in  the  C-VM  formula 
SACV  =  sum  of  the  ACV  values 
CVM  *  values  of  the  modified  C-VM  test  statistic 

ablu  =  BLUE  of  location  parameter  a 
bblu  =  BLUE  of  scale  parameter  b 

c  =  null -hypothesized  Pareto  shape  parameter 
obs  =  number  of  observations  in  each  of  5  cells 
rtend  =  right  endpoint  of  a  cell 

X2  =  array  of  values  of  the  Chi-square  test  statistic 


CS3a»::3ss::n»:33s:3ts»rs:::::::::a::s:9:Ks::3ass:ss:3»s 


902 

903 

904 

905 

906 

907 

908 

909 

910 

911 

912 

913 

914 

915 

916 

917 

918 

919 

920 

921 

922 

923 

924 

925 

926 

927 

928 

929 

930 

931 

932 

933 

934 

935 

936 

937 

938 

939 

940 

941 

942 

943 

944 

945 

946 

947 

948 

949 

950 

951 

952 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

C! 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c- 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

C' 

c 

c 

c 

c 

c 

c 

c 

c 

C' 

c 

c 

c 

c 

c 


Inputs 

n  *  sample  size  *  5,  15,  or  25  (from  MAIN  DO  Loop  70) 

P  *  array  of  n  values  of  hypothesized  CDF  (from  HYPCDF) 
nshp  -  null -hyp  shape  counter  (from  MAIN  DO  Loop  90) 
naif  3  significance  level  counter  (from  MAIN  DO  Loop  80) 
nsiz  -  sample  size  counter  (from  MAIN  DO  Loop  70) 
nalt  ~  alternate  CDF  counter  (from  MAIN  DO  Loop  60) 
ablu  3  BLUE  of  location  a  (from  BLCLE2  or  BLC6T2) 
bblu  3  BLUE  of  scale  b  (from  BLCLE2  or  BLCQT2) 

c  3  hypothesized  Pareto  shape  (from  MAIN  DO  Loop  90) 


Calculations  for  K-S  test  statistic  (eqns  41  &  42): 

DP(i )  3  ABSC  (i/n)  -  P(i)  3 
DM(i)  »  ABSC  P(i >  -  (i-l)/n  3 


DPLUS  3  max  C  DP(i)  3  for  i*l,2 . n 

DMINUS  3  max  C  DM(i)  3  for  i3l,2,...,n 

KS  3  max  (DPLUS, DMINUS) 


Calculations  for  A-D  test  statistic  (eqn  43): 


AL( j)  3  In  <P(j)) 

AM( j)  3  In  (1  -  P(n+l-j) ) 
AN( j)  *  AL( j)  +  AM( j) 


AAA(j)  3  (2*j  -  1)  *  AN( j) 

SAAA  3  AAA ( 1 )  +  AAA (2)  +  ...  +  AAA (n) 


AD  3  -n  -  (1/n)  *  SAAA 


Calculations  for  C-VM  test  statistic  (eqn  44): 

ACV(k)  3  C  P(k)  -  (2*k  -  l)/(2*n)  3**2 
SACV  3  ACV<1>  +  ACV (2)  +  ...  +  ACV(n) 

CVM  3  (1/ ( 12*n) )  +  SACV 


Calculations  for  Chi-square  test  statistic  (eqn  62): 

rtend(i)  3  ablu  -  bblu  +  bblu  *  (1  -  . 
ex  3  n  /  5. 


2*i )  **  (-1/c) 


954 

c 

X2  *  C <obs(l)-ex)t*23  /ax  +  C <obs(2)-ex) *423  /  ex 

955 

c 

+  ...  +  C(obs(5)-ex>**23  /  ex 

956 

c 

957 

Casi8Usa3iaas»sB&ss38aaasas3S3XXB38ass3aasBsaB38sss3as*8a«3XS8sa 

958 

c 

959 

c 

Declare  Variables: 

960 

c 

961 

common  dseed, x,n,c,nc,B,D,ablu, bblu,P,Bsuml,Bxsuml, 

962 

1 

Bx  sum2 , Bx  sm2c , KS , AD , CVM, i t , nsi z , nshp , nrep , 

963 

1 

nal  t ,  nal  f ,  nrKS,  nr  AD,  nr  CV,  nr  X2,  X2 

964 

integer  n, nsi z, nshp, it, nrep, nrKS <2, 2,3,8) , nr AD (2, 2, 3, 8) , 

965 

1 

nrCV<2, 2, 3, 8) , obs (5) , nrX2<2, 2, 3, 8) 

966 

real  x  <25> ,ablu,bblu,B<25> ,D,KS<2,2,3,8> ,AD<2,2,3,8> , 

967 

1 

CVM (2, 2, 3, 8) , c, nc, Bsuml , Bxsuml , Bxsum2, Bxsm2c, 

968 

1 

P(25)  ,r  (25) , alpha, KSpwr  (2, 2, 3, 8) , ADpwr  <2, 2, 3, 8) , 

969 

1 

CVpwr  (2, 2, 3, 8) ,  DP  (25) ,  DM  (25) ,  DPLUS,  DMINUS,  AL (25) , 

970 

1 

AM (25) , AN (25) , AAA (25) , SAAA, ACV (25) , SACV, rtend ( 4) , 

971 

1 

X2crit (2,2,3) , X2(2,2, 3,8) , ex 

972 

doable  precision  dseed 

973 

c 

974 

c 

— 

-  Compute  the  K-S  Test  Statistic  (eqns  41  St  42):  - 

975 

c 

976 

DPLUS  =  0 

977 

DMINUS  »  0 

978 

do  5  ik  *  1,25 

979 

DP(ik)  =  0 

980 

DM(ik)  *  0 

981 

5 

continue 

982 

c 

983 

do  10  i  *  l,n 

984 

c 

985 

DP(i)  *  ABS(  (i/real (n) )  -  P(i)  ) 

986 

DM(i )  *  ABS<  P(i)  -  (i-l>/real (n)  > 

987 

c 

988 

c 

if  (nshp.eq.l  .and.  nalf.eq.2  .and.  n.eq.5  .and. 

989 

c 

1  nalt  .It.  3)  then 

990 

c 

print*, ’P(i )=’ ,P(i ) , ’DP(i )=’ ,DP(i ) , ’DM(i ) =’ , DM ( i ) 

991 

c 

end  if 

992 

c 

993 

10 

continue 

994 

c 

995 

DPLUS  =  MAX (  DP(1) ,DP(2) ,DP(3) ,DP(4) ,DP(5) ,DP(6) ,DP(7) , 

996 

1 

DP (8) , DP<9) ,DP ( 10) , DP( 1 1 ) , DP ( 12) ,DP( 13) , DP( 14) , 

997 

1 

DP (15) ,  DP (16) ,  DP (17) , DP (18) , DP (19) ,DP(20) , 

998 

1 

DP (21 ), DP (22) , DP (23) , DP (24) , DP (25)  ) 

999 

c 

1000 

DMINUS  »  MAX (  DM(1),DM(2),DM(3),DM(4),DM(5),DM(6),DM(7), 

1001 

1 

DM (8) ,DM(9) , DM(10) ,DM(11) ,DM(12) ,DM(13) ,DM(14) , 

1002 

1 

DM( 15) , DM( 16) , DM( 17) , DM( 18) , DM( 19) ,DM(20) , 

1003 

1 

DM<21) ,DM(22) ,DM(23) ,DM(24) ,DM(25)  ) 

1004 

c 

1005 

KS(nshp, naif ,nsiz, nalt)  *  MAX (DPLUS, DMINUS) 

print*,’*  ********************** 
print*,’  ’ 

print*, ’KS  VALUES  FROM  TESTAT  --  ITERATION  »’,it 
print*,  ’c=’  ,c,  ’nalf«’  ,naH,  ’  **  n»’,n,’  **  nalt=’ 

print*, ’KS  Stat»’ ,KS(nshp,nalf ,nsiz,nalt) , 

’  **  DPLUS*' , DPLUS, ’  **  DMINUS*’ , OMINUS 

print*,’  ’ 

—  Compute  the  A-D  Teat  Statiatic  (eqn  43) :  - 

SAAA  *  0 

do  20  j  *  l,n 

AL(j)  *  log  <P(j)) 

AM(  j)  *  log  (1.0  -  P(n+l-j) ) 

AN( j)  *  AL( j)  +  AM( j) 

AAA(j)  -  (2.0* j  -  1.0)  *  AN( j) 

SAAA  *  SAAA  +  AAA(j) 
continue 

AD(nshp,nalf ,nsiz,nalt)  »  -n  -  (1.0/real (n) )  *  SAAA 

Compute  the  C-VM  Test  Statistic  (eqn  44):  - 

SACV  -  0 
do  30  k  »  l,n 

ACV(k)  *  (  P(k)  -  (2. 0*k  -  1.0) / (2.0*real (n) )  )**2 
SACV  *  SACV  +  ACV(k) 
continue 

CVM(nshp,nalf ,nsiz,nalt)  ■  SACV  +  (1.0/ (12.0*real (n) ) ) 

Compute  the  Chi-Square  Test  Statistic  (eqn  62):  - 

do  40  in  *  1,5 
obs(in)  *  0 
continue 

do  50  ki  *  1,4 

rtend(ki)  *  abiu-bblu  +  bblu*(l.-.2*ki)**(-l./c) 
continue 

do  60  m  *  l,n 

if(  x(m)  . le.  rtend(l)  )  then 
obs(l)  *  obs(l)  +  1 
else  if  (x (m) .le.rtend(2) )  then 
obs<2>  =  obs (2)  +  1 
else  if  (x (m) . le.rtend (3) )  then 


1058 

obs<3)  *  obs (3)  +  1 

1059 

else  if  (x  (m)  . le.rtend(4) )  then 

1060 

obs(4)  *  obs<4)  +  1 

1061 

else 

1062 

obs<5)  =  obs (5)  +  1 

1063 

end  if 

1064 

c 

1065 

60 

continue 

1066 

c 

1067 

ex  ■  n  /  5. 

1068 

c 

1069 

X2(nshp,nalf , nsiz,nalt)  *  (  (obs (1) -ex)  **2  )  /  ex 

1070 

1 

+  ( (obs (2) -ex ) t%2) /ex  +  ( (obs(3)-ex)**2)/ex 

1071 

1 

+  ((obs(4)-ex)**2)/ex  +  '  (obs(5)-ex) t*2) /ex 

1072 

c 

1073 

c 

print*,’  ’ 

1074 

c 

print*, ’+  ++++  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +’ 

1075 

c 

print*,’  ’ 

1076 

c 

print*, ’X2  VALUES  FROM  TESTAT  --  ITERATION  =  ’,it 

1077 

c 

print*. ’c*’ ,c, ’naif =’, naif , ’  **  n*’,n,’  **  nalt=’,nalt 

1078 

c 

print*, ’RT  ENDPOINTS  OF  INTERVALS:' 

1079 

c 

print*, rtend (1) , rtend (2) ,rtend(3) , rtend (4) 

1080 

c 

print*, ’x(l)=’,x(l),’x(10)«’,x(10),’x(25)*',x(25) 

1081 

c 

print*, ’OBSERVATIONS  PER  CELL:’ 

1082 

c 

print*, ’Cell  l:’,obs(l),’  **  Cell  2:’, obs (2) 

1083 

c 

print*, ’Cell  3:’,obs<3),’  **  Cell  4:’,obs(4> 

1084 

c 

print*, ’Cell  5:’, obs (5) 

1085 

c 

print*, ’CHI  SQUARE  TEST  STAT: ’ 

1086 

c 

print*, ’ X2  Stat*’ , X2 (nshp, naif , nsiz, nal t) 

1087 

c 

print*,’  ’ 

1088 

c 

1089 

return 

1090 

end 

1091 

c 

1092 

C«BS3: 

aaasaaaassasaassassaBassassssssEBcsssssasssssssBsssassassss 

1093 

c 

END  SUBROUTINE  TESTAT 

1094 

cttmttt  ******* *************** ************* ****** ****** *********** 

Subroutine  COMPAR 

cl*  It 

cl*  BEGIN  SUBROUTINE  COMPAR  I* 

cl*  It 

cttttttttttltttltttltltttttttttttttttttttttltttttttlttttttttttttttt 
c 

c  Ref:  Appendix  B,  Figure  7 ,  Step  3. 


Purposes 


Compare  a  test  statistic,  calculated  -from  Chi-square  or  the 
modified  Kolmogorov-Smirnov  (K-S),  Anderson-Darling  <A-D), 
or  Cramer-von  Mises  (C-VM)  test,  against  the  appropriate 
critical  value.  From  a  series  of  test  statistics,  count  the 
number  of  times  the  null  hypothesis  is  rejected,  i.e.,  the 
number  of  test  statistic  values  that  exceed  the  critical 
value.  The  K-S,  A-D,  and  C-VM  critical  values  were  taken 
from  Tables  VI-  VIII  of  the  thesis. 


c  Variables: 

c  c  *  null -hypothesis  Pareto  shape  parameter 

c  alpha  «  significance  level 

c  n  =  sample  size 

c  nshp  *  shape  parameter  counter  (Is  c*1.0:  2s  c=3.5) 

c  naif  =  significance  level  counter  (Is  a=.05;  2:  a=.oi> 

c  nsiz  =  sample  size  counter  (Is  n=5;  2:  n=15s  3:  n=25) 

c  KS  *  array  of  modified  K-S  test  statistics 

c  CVM  =  array  of  modified  C-VM  test  statistics 

c  AD  *  array  of  modified  A-D  test  statistics 

c  X2  »  array  of  Chi-square  test  statistics 


Inputs 

c  a  null -hyp  shape  parameter  (from  MAIN  DO  Loop  90) 
alpha  =  significance  level  (from  MAIN  DO  Loop  80) 
n  =  sample  size  (from  MAIN  DO  Loop  80) 
nshp  =  shape  parameter  counter  (from  MAIN  DO  Loop  90) 
naif  =  significance  level  counter  (MAIN  DO  Loop  80) 
nsiz  =  sample  size  counter  (from  MAIN  DO  Loop  70) 
nalt  =  alternate  CDF  counter  (from  MAIN  DO  Loop  60) 

KS  =  array  of  K-S  test  statistics  (from  TESTAT) 

CVM  »  array  of  C-VM  test  stats  (from  TESTAT) 

AD  =  array  of  A-D  test  statistics  (from  TESTAT) 
KScrit (nshp, naif ,nsiz)  =  K-S  critical  values  (Table  VI) 

ADcrit (nshp, naif ,nsiz)  =  A-D  critical  values  (Table  VII) 

CVcrit (nshp, naif ,nsiz)  =  CVM  critical  values  (Table  VIII) 


1147  c  X2crit <nshp, naif , nsiz )  -  Chi-square  critical  values 

1148  c 

1149  c======,at=:::,=====:::====*=====3:=“=3*i===================!===========*== 

1150  c 

1151  c  Calculations:  none 

1152  c 

1153  C3333333333333333333333333333333333333333333333333333333333333"3 

1154  c 

1155  c  Output: 

1156  c 

1157  c  nrKS  =  number  of  times  null  hypothesis  is  rejected  under  K-S 

1158  c  nr AD  *  number  of  times  null  hypothesis  is  rejected  under  A-D 

1159  c  nrCV  =  number  of  times  null  hypothesis  is  rejected  under  CVM 

1160  c  nrX2  3  number  of  times  null  hyp  is  rejected  under  Chi-square 

1161  c 


1162 

1163 

c— 

c 

issis::x3zssxcKS3s3assas:sasssBsaa3S8snss33as3sa3ss=3s 

1164 

c 

Declare  Variables: 

1165 

c 

1166 

common  dseed ,x,n,c,nc,B,D,ablu,bblu,P,  Bsum 1 ,  Bx  sum 1 , 

1167 

1 

Bxsum2, Bx sm2c , KS, AD, CVM, i t , nsi z , nshp, nrep , 

1168 

1 

nalt, naif, nrKS, nrAD, nrCV,nrX2, X2 

1169 

integer  n. nsiz, nshp, it, nrep, nrKS <2, 2, 3, 8) , nr AD (2, 2, 3, 8) 

1170 

1 

nrCV(2, 2, 3, 8) ,nrX2<2,2,3,8) 

1171 

real  x (25) ,ablu.bblu,B<25> ,D,KS<2,2,3,8> ,AD(2,2,3,8) 

1172 

1 

CVM (2, 2, 3, 8) , c. nc, Bsuml , Bxsuml , Bxsum2. Bxsm2c, 

1173 

1 

P (25) , r  (25) , alpha, KSpwr  (2, 2, 3, 8) .  ADpwr  (2, 2. 3, 8) 

1174 

1 

CVpwr (2, 2, 3, 8) .KScrit (2. 2. 3) . ADcri t (2, 2, 3) , 

1175 

1 

CVcrit (2,2,3) , X2crit (2, 2,3) , X2<2,2, 3,8) 

1176 

double  precision  dseed 

1177 

c 

1178 

c 

print*,  ’  ****************************************’ 

1179 

c 

print*, ’Numbers  of  Rejects  at  CDMPAR  Entrance' 

1180 

c 

print*, ’c  3’,c,’nalf  naif , ’n=' ,n, ’nalt3’ , nalt 

1181 

c 

print*, 'KS  Rejects  *  ’ ,nrKS(nshp, naif ,nsiz, nalt) 

1182 

c 

print*, 'AD  Rejects  3  ', nr AD (nshp, naif , nsi z, nalt) 

1183 

c 

print*, 'CV  Rejects  =  ’ ,nrCV(nshp, naif ,nsiz, nalt) 

1184 

c 

print*,  '  =*s»M==®=!=a===s===s====:=r==’ 

1185 

c 

1186 

c 

— 

Input  K-S  Critical  Values  from  Table  VI:  - 

1187 

c 

1188 

KScrit (1, 1, 1)  *  .3676251 

1189 

KScrit  <1, 1,2)  *  .2157919 

1190 

KScrit (1, 1,3)  »  .1698559 

1191 

KScrit  <1,2, 1)  3  .4074441 

1192 

KScrit (1,2, 2)  3  .2468265 

1193 

KScrit (1,2, 3)  3  .2007451 

1194 

KScrit (2, 1, 1)  3  .3493998 

1195 

KScrit (2, 1,2)  3  .2376525 

1196 

KScrit <2, 1,3)  3  .1886063 

1197 

KScrit (2,2,1)  3  .3815996 

1198 

KScrit (2, 2, 2)  3  .2743093 

1199 

KScrit (2, 2, 3) 

=  .2182668 

1200 

c 

1201 

c 

-  Input  A-D  Critical  Values  -from  Table  VII: 

1202 

c 

1203 

ADcritd,  1,1) 

=  1.236920 

1204 

ADcrit  <1, 1,2) 

=  .8907447 

1205 

ADcritd,  1,3) 

=  .9147376 

1206 

ADcrit (1,2, 1) 

*  2.076011 

1207 

ADcrit (1,2,2) 

=  1.250242 

1208 

ADcrit (1,2,3) 

*  1.311781 

1209 

ADcrit (2, 1, 1) 

=  .6840515 

1210 

ADcrit (2, 1.2) 

=  . 8985860 

1211 

ADcrit <2, 1,3) 

=  . 9520599 

1212 

ADcrit (2, 2,1) 

*  .9126385 

1213 

ADcrit(2,2,2) 

*  1.268849 

1214 

ADcrit (2,2,3) 

*  1.449695 

1215 

c 

1216 

c 

-  Input  C-VM  Critical  Values  from  Table  VII 

1217 

c 

1218 

CVcritd,  1, 1) 

*  .1389776 

1219 

CVcrit (1,1,2) 

=  .1312229 

1220 

CVcritd,  1,3) 

*  .1386932 

1221 

CVcritd, 2,1) 

*  .1738497 

1222 

CVcrit (1,2, 2) 

=  .1923594 

1223 

CVcrit (1,2.3) 

=  .1988135 

1224 

CVcrit (2, 1,1) 

=  .1186844 

1225 

CVcrit (2. 1.2) 

*  .1561372 

1226 

CVcrit (2, 1,3) 

*  .1618638 

1227 

CVcrit (2, 2, 1) 

*  .1574178 

1228 

CVcrit (2, 2, 2) 

*  .2217665 

1229 

CVcrit (2,2,3) 

*  .2403474 

1230 

c 

1231 

c 

-  Input  Chi-square  Critical  Values  :  - 

1232 

c 

1233 

X2crit  <1,1, 1) 

=  6.000003 

1234 

X2crit (1, 1,2) 

=  7.333337 

1235 

X2crit( 1,1,3) 

*  7.600005 

1236 

X2crit (1,2,1) 

=  12.00000 

1237 

X2crit(l,2,2) 

=  10.66667 

1238 

X2crit<l,2,3> 

=  10.80000 

1239 

X2crit (2, 1, 1) 

=  6.000003 

1240 

X2crit(2, 1,2) 

»  7.333337 

1241 

X2crit (2, 1,3) 

=  7.600005 

1242 

X2crit (2, 2, 1 ) 

=  6.000003 

1243 

X2crit  (2, 2, 2) 

*  10.46378 

1244 

X2crit  <2,2, 3) 

*  10.80000 

1245 

c 

1246 

c 

-  Compare  Test  Statistics  vs  Critical  Values 

1247 

c 

1248 

c 

print*, ’ $$$*$*$«**$*$$**$$«$$«$$$*$«**$$$$ 

1249 

c 

print*, ’BEFORE 

REJ  COUNTER  IS  INCREMENTED: 

1250 

c 

print*, ’c  =' ,c 

,’nalf  =’,nalf,’  **  n=’,n. 

**  nalt 


W  M 


print*,  KS  Stat  »’ ,KS(nshp, naif , nsiz, nalt) , 

1  ’  Crit  *’ ,KScr it (nshp, naif ,nsiz> 

print*, ’AD  Stat  =’ ,AD(nshp, naif , nsiz, nalt) , 

1  ’  Crit  ** , ADcrit (nshp, naif ,nsiz) 

print*, ’CV  Stat  =’  ,CVM (nshp, naif  , nsiz, nalt) , 

1  ’  Crit  ,CVcrit(nshp, naif , nsiz) 

print*, ’X2  Stat  »’ , X2(nshp, nal f , nsiz, nal t) , 

1  ’  Crit  s’, X2crit(nshp, naif, nsiz) 

print*,  ’***0****t*C*»******«*$****»*$*«$’ 

if  (  KS(nshp, naif , nsiz, nalt)  .gt.  KScrit(nshp, nal-f, nsiz)  ) 
nrKS(nshp,nalf, nsiz, nalt)  =  nrKS(nshp, naif , nsiz, nalt)  +  1 

if  (  AD(nshp,nalf, nsiz, nalt)  .gt.  ADcrit (nshp, naif ,nsiz)  ) 
nrAD(nshp, naif , nsiz, nalt)  =  nrAD(nshp, naif , nsiz, nalt)  +  1 

if  <CVM(nshp, naif , nsiz, nalt)  .gt.  CVcrit(nshp,nalf ,nsiz)  ) 
nrCV(nshp, naif , nsiz, nalt)  *  nrCV(nshp, naif , nsiz, nalt)  +  1 

if  (  X2 (nshp, naif, nsiz, nalt)  .gt.  X2crit (nshp, naif , nsiz)  ) 
nrX2(nshp, naif , nsiz, nalt)  *  nrX2(nshp, naif , nsiz, nalt)  +  1 


print*,  ’ 

print*, ’Numbers  of  Rejects  at  COMPAR  Exit’ 
print*, ’c  =’,c,’nalf  =’,nalf,’  n=’,n,’  nalt=’,nalt 
print*, ’KS  Rejects  =  ’ ,nrKS(nshp, naif , nsiz, nalt) 
print*, ’AD  Rejects  ■  ’ ,nrAD(nshp, naif, nsiz, nalt) 
print*, ’CV  Rejects  *  ’, nrCV(nshp, naif , nsiz, nalt) 
print*, ’X2  Rejects  *  ’ ,nrX2(nshp, naif , nsiz, nalt) 


return 


c  END  SUBROUTINE  COMPAR 

c***************************************************************** 


BIBLIOGRAPHY 


1.  Amstadter,  B.  Reliability  Mathematics.  New  York: 
McGraw-Hill  Book  Company,  l97l . 

2.  Anderson,  T.  W.  and  D.  A.  Darling.  "Asymptotic  Theory  of 
Goodness  of  Fit  Criteria  Based  on  Stochastic  Processes," 
Annals  of  Mathematical  Statistics.  23:  193-212  (1952). 

3.  Anderson,  T.  W.  and  D.  A.  Darling.  "A  Test  of  Goodness 
of  Fit, "  Journal  of  the  American  Statistical  Associ ati on.  49: 
765-769  (Dec  1954). 

4.  Andrews,  D.  F.  and  others.  Robust  Estimates  of  Location. 
Princeton  University  Press,  1972. 

5.  Banks,  Jerry  and  John  S.  Carson.  Pi screte-Event  System 
Simulation.  Englewood  Cliffs:  Prentice-Hall,  1984. 

6.  Bell,  C.  B.  and  others.  Signal  Detection  for  Pareto 
Renewal  Processes.  Technical  Report  No.  8—82  for  the  Office 
of  Naval  Research.  Contract  N00014-80-C-0208.  San  Diego 
State  University,  San  Diego  CA,  Oct  1982  (AD-A120  972). 

7.  Berger,  J.  M.  and  B.  Mandelbrot.  "A  New  Model  for  Error 
Clustering  in  Telephone  Circuits,"  IBM  Journal  of  Research 
and  Development,  7:  224-236  (July  1963). 

8.  Brownlee,  K.  A.  Stati sti cal  Theory  and  Methodology  in 
Science  and  Engineering  (Second  Edition).  New  York:  John 
Wiley  and  Sons,  1965. 

9.  Bush,  J.  G.  and  others.  "Modified  Cramer — von  Mises  and 
Anderson-Darl ing  Tests  for  Weibull  Distributions  with  Unknown 
Location  and  Scale  Parameters,"  Commun i c at i ons  in  Statistics. 
Part  A  -  Theory  and  Methods.  12:  240-245  (1983). 

10.  Buslenko,  N.  P. ,  and  others.  The  Monte  Carlo  Method. 

New  York:  Pergamon  Press,  1966. 

11.  Champernowne.  D.  G.  "The  Graduation  of  Income  Distri¬ 
butions,"  Econometri ca.  20:  591-615  (1952). 

12.  Charek,  Dennis  J.  A  Comparison  of  Estimation  Techniques 
for  the  Three-Parameter  Pareto  Pi str i but i on .  MS  Thesis, 
GS0/MA/85D-3.  School  of  Engineering,  Air  Force  Institute  of 
Technology  (AU) ,  Wright  Patterson  AFB  OH,  December  1985. 


13.  Conover,  W.  J.  Practical  Nonpar ametr i c  Statistics 
(Second  Edition).  New  York:  John  Wiley  and  Sons,  1980. 

14.  David,  F.  N.  and  N.  L.  Johnson.  "The  Probability 

Integral  Transformation  When  Parameters  are  Estimated  -from 
the  Sample,"  Biometrika.  35:  182-190  (1948). 

15.  David,  Herbert  A.  Order  Statistics  (Second  Edition). 

New  York:  John  Wiley  and  Sons,  1981. 

16.  Davis,  Henry  T.  and  Michael  L.  Feldstein.  "The  Genera¬ 
lized  Pareto  Law  as  a  Model  for  Progressi vel y  Censored 
Survival  Data,"  Biometri ka.  66:  299-306  (1979). 

17.  Fisk,  P.  R.  "The  Graduation  of  Income  Distributions," 
Econometrica.  29:  171-185  (1961). 

18.  Freiling,  E.  C.  A  Comparison  of  the  Fal lout  Mass-Size 
Distributions  Calculated  by  Lognormal  and  Power — Law  Models. 
Report  No.  USNRDL-TR-1 105  for  the  U.S.  Naval  Radiological 
Defense  Laboratory,  San  Francisco  CA,  Nov  1966  (AD-646019). 

19.  Green,  J.  and  Y.  Hegazy.  "Powerful  Modified  EDF 
Goodness-of-Fi t  Tests,"  Journal  of  the  American  Statistical 
Association.  71 :  204-209  (1976). 

20.  Hajek,  Jaroslav.  A  Course  in  Non-Par ametr i c  Statistics. 
San  Francisco:  Holden-Day,  Inc.,  1969. 

21.  Hammersley,  J.  M.  and  D.  C.  Handscomb.  Monte  Carlo 
Methods.  London:  Methuen  and  Co. .  1967. 

22.  Harris,  Carl  M.  "The  Pareto  Distribution  as  a  Queue 
Service  Discipline,"  Operations  Research.  16:  307-313 
(Jan-Feb  1968) . 

23.  Harter,  H.  L.  Order  Stati st i cs  and  Thei r  Use  i n  Testing 
and  Estimation.  Vol  2.  Aerospace  Research  Laboratories, 

Wri ght— Patterson  AFB  OH,  1969. 

24.  Harter,  H.  L.  "Another  Look  at  Plotting  Positions," 
Commun i cat i ons  in  Stati sti cs.  A13 ( 15) :  1613-1633  (1984). 

25.  Harter,  H.  L.  "A  Monte  Carlo  Study  of  Plotting 
Positions,"  Commun i cat i ons  in  Statistics.  B14 (2) :  317-343 
(1985) . 

26.  Hastings,  N.  A.  J.  and  J.  B.  Peacock.  Stati sti cal 
Pi stributi ons.  London:  Butterworth  &  Co.  Ltd.,  1974. 


27.  Hines,  William  W.  and  Douglas  C.  Montgomery.  Probabi 1 i- 
ty  and  Stati sties  in  Engineering  and  Management  Science.  New 
Yorks  The  Ronald  Press  Co.,  1972. 

28.  Johnson,  Norman  L.  and  Samuel  Kotz.  Continuous  Uni var— 
i ate  Distributions-l.  Boston:  Houghton  Mifflin  Co.,  1970. 

29.  Kaminsky,  Kenneth  S.  Best  Li  near  Unbi ased  Prediction  of 
Order  Stati sties  i n  Exponential  and  Pareto  Populations. 
Contract  F336 1 5-7 1-C- 1463.  Technical  Report  No.  ARL  75-0201 
for  Aerospace  Research  Laboratori es,  Wright-Patterson  AFB  OH, 
June  1975  (AD-A014  740). 

30.  Kaminsky,  Kenneth  S.  and  Paul  I.  Nelson.  "Best  Linear 
Unbiased  Prediction  of  Order  Statistics  in  Location  and  Scale 
Families,"  Journal  of  the  American  Statistical  Assoc i ati on. 
Z0:  145-150  (1975). 

31.  Kapur,  K.  C.  and  L.  R.  Lamberson.  Rel iabi 1 i ty  in 
Engineering  Design.  New  Yorks  John  Wiley  and  Sons,  1977. 

32.  Koutrouvel i s,  Ioannis.  Estimation  of  Asymptotic  Pareto 
Laws  and  the  Tail  of  a  Pi stri but i on .  Contract  Number 
N00014-72-C-0508.  Technical  Report  No.  34  for  Office  of 
Naval  Research,  Arlington  VA,  Aug  1975  (AD-A018  173). 

33.  Kulldorff,  Gunnar  and  Kerstin  Vannman.  "Estimation  of 
the  Location  and  Scale  Parameters  of  a  Pareto  Distribution  by 
Linear  Functions  of  Order  Statistics",  Journal  of  the 
American  Stati stical  Association,  68s  218-227  (1973). 

34.  Lilliefors,  H.  "On  the  Kolmogorov-Smirnov  Test  for 
Normality  with  Mean  and  Variance  Unknown”,  Journal  of  the 
American  Stat i sti cal  Assoc i at ion.  62s  399-402  (1967). 

35.  Lilliefors,  H.  "On  the  Kolmogorov-Smirnov  Test  for  the 
Exponential  Distribution  with  Mean  Unknown",  Journal  of  the 
Ameri can  Statistical  Association.  64:  387-399  (1969). 

36.  Littel,  Ramon  C. ,  James  McClave,  and  Walter  Qffen. 
"Goodness-of-Fit  Tests  for  the  Two  Parameter  Wei  bull 
Distribution",  Commun i cat i ons  in  Statistics.  B8 (3) s  257-269 
(1979) . 

37.  Little,  Robert  E.  Probabi 1 i ty  and  Stati sties  for 
Engineers.  Champaign  IL:  Matrix  Publishers,  Inc.,  1978. 

38.  Mann,  N.  R. ,  E.  M.  Scheuer,  and  K.  W.  Fertig.  "A  New 
Goodness-of-Fit  Test  for  the  Two-Parameter  Wei  bull  or 
Extreme-Value  Distribution  with  Unknown  Parameters", 

Commun i cations  in  Stati sti cs.  2:  383-400  (1973). 


39.  Massey.  Frank  J.  “The  Kolmogorov-Smirnov  Test  for 
Goodness  of  Fit",  Journal  of  the  American  Statistical 
Assoc i at i on ■  46:  68-78  (1951). 

40.  Mood,  A.  M.  ahd  F.  A.  Graybill,  Introduction  to  the 
Theory  of  Statistics  (Second  Edition).  New  York:  McGraw  Hill 
Inc.,  1963. 

41.  Moore,  Albert  H.  and  H.  L.  Harter.  “One-order — statistic 
Conditional  Estimators  of  Shape  Parameters  of  Limited  and 
Pareto  Distributions  and  Scale  Parameters  of  Type  II 
Asymptotic  Distributions  of  Smallest  and  Largest  Values," 

IEEE  T r ansact i ons  on  Rel i abi 1 i ty,  R-16:  100—103  (1967). 

42.  Pigou,  A.  C. ,  The  Economics  of  Welfare.  London: 
Macmillan  and  Co.,  1948. 

43.  Ream,  Thomas  J.  A  New  Goodness  of  Fit  Test  for 
Normality  with  Mean  and  Variance  Unknown.  MS  Thesis, 
G0R/MA/81D— 9.  School  of  Engineering,  Air  Force  Institute  of 
Technology  (AU) ,  Wright-Patterson  AFB  DH,  Dec  1981. 

44.  Steindl,  Josef.  Random  Processes  and  the  Growth  of 
Firms.  New  York:  Hafner  Publishing  Co.,  1965. 

45.  Stephens,  M.  A.  "EDF  Statistics  for  Goodness  of  Fit  and 
Some  Comparisons",  Journal  of  the  American  Statistical 
Association.  69:  730-737  (Sep  1974). 

46.  Stephens.  M.  A.  “Asymptotic  Results  for  Goodness-of-Fi t 
Statistics  with  Unknown  Parameters" ,  Annals  of  Statistics.  4: 
357-369  (1976). 

47.  Stephens.  M.  A.  The  Anderson-Darl ing  Statistic.  Grant 
No.  DAAG29— 77— G— 0031 .  Technical  Report  No.  39  for  the  U.S. 
Army  Research  Office.  Dept,  of  Statistics,  Stanford 
University.  Stanford  CA,  Oct  1979  (AD-A079  807). 

48.  Vannman.  Kerstin.  "Estimators  Based  on  Order  Statistics 
from  a  Pareto  Distribution",  Journal  of  the  American 

Stati stical  Assoc i ati on.  71 :  704-708  (Sep  1976) . 

49.  Viviano.  Philip  J.  A  Modi f i ed  Kol mogorov-Smi rnov. 
Anderson-Darl ing,  and  Cramer — von  Mises  Test  for  the  Gamma 
Distribution  wi th  Unknown  Locati on  and  Seal e  Parameters.  MS 
Thesis,  G0R/MA/82D-4.  School  of  Engineering,  Air  Force 
Institute  of  Technology  (AU) ,  Wri ght-Patterson  AFB  0H,  Dec 


50.  Wingo,  Dallas  R.  "Estimation  in  a  Pareto  Distributions 
Theory  and  Computation",  IEEE  Transactions  on  Rel i abi 1 i ty , 
R-28:  35-37  (Apr  1979). 

51.  Wong,  Wing-Yue.  On.  the  Property  of  Pul  lness  of  Pareto 
Distribution.  Contract  No.  N00014-75— C0455.  Technical 
Report  No.  82-16  for  the  Office  of  Naval  Research,  Purdue 
University.  West  Lafayette  IN,  May  1982  (AD-A119  631). 

52.  Woodbury,  Larry  B.  A  New  Goodness  of  Fit  Test  for  the 
Uniform  Distribution  with  Unspecified  Parameters.  MS  Thesis, 
G0R/MA/82D-6.  School  of  Engineering,  Air  Force  Institute  of 
Technology  (AU) ,  Wright— Patterson  AFB  OH,  Dec  1982. 

53.  Woodruff,  Brian  W.  and  others.  "A  Modified 
Kolmogorov— Smirnov  Test  for  Wei bull  Distributions  with 
Unknown  Location  and  Scale  Parameters, "  IEEE  Transactions  on 
Reliability,  R-32;  209-213  (Jun  1983). 

54.  Yoder.  John  D.  Modified  Kol mogor ov-Smi rnov . 
Anderson-Darl ing,  and  Cramer — Von  Mi ses  Tests  for  the  Logi stic 
Distribution  with  Unknown  Location  and  Scale  Parameters.  MS 
Thesis,  G0R/ENC/83D.  School  of  Engineering,  Air  Force 
Institute  of  Technology  (AU) ,  Wright  Patterson  AFB  OH, 
December  1983. 


Captain  James  E.  Porter  III  Mas  born  in  Tokyo,  Japan,  on 
24  September  1931.  He  graduated  -from  Judson  High  School, 
Converse,  Texas,  in  1969.  He  then  attended  the  University  of 
Texas  at  Austin  and  in  1974  graduated  Phi  Beta  Kappa  Mith  a 
Bachelor  of  Science  degree  in  Mathematics. 

Upon  completing  Officer  Training  School  and  receiving 
his  USAF  commission  in  April  1973,  he  Mas  assigned  to  the 
Space  Systems  (now  called  Space  Operations)  career  field.  Air 
Force  Specialty  Cade  (AFSC)  20XX.  He  served  as  a  Space 
Surveillance  Officer  at  the  Sea-Launched  Ballistic  Missile 
Detection  and  Warning  radar  site.  Fort  Fisher  AFS,  North 
Carolina,  from  June  1975  to  May  1977;  and  at  the  Ballistic 
Missile  Early  Warning  System  radar  site,  Thule,  Greenland, 
from  May  1977  to  May  1978. 

From  June  1978  to  May  1981  Captain  Porter  Mas  assigned 
to  Headquarters  North  American  Aerospace  Defense  Command, 
Peterson  AFB,  Colorado,  as  a  Space  Systems  Staff  Officer.  He 
next  served  as  Space  Operations  Career  Management  Staff 
Officer,  Air  Force  ManpoMer  and  Personnel  Center,  Randolph 
AFB,  Texas,  until  May  1984.  He  then  entered  the  Graduate 
Space  Operations  Program,  School  of  Engineering,  Air  Force 
Institute  of  Technology. 

Address:  4026  Kirby  Drive,  San  Antonio,  Texas  78219. 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


.'.'/REPORT  security  classification 

UNCLASSIFIED 


2*.  security  classification  authority 


REPORT  DOCUMENTATION  PAGE 

I  lb.  RESTRICTIVE  MARKINGS 


2b.  DECLASSIFICATION/DOWN  GRADING  SCHEDULE 


4.  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

AFIT/GSQ/MA/85D-6 


6a.  NAME  OF  PERFORMING  ORGANIZATION 

School  o f  Engineering 


6c.  AOORESS  (City.  Slat*  and  ZIP  Coda) 


3.  OISTRIBUTION/AVAILABILITY  OF  REPORT 

Approved  -for  public  release; 
distribution  unlimited 


5.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


b.  OFFICE  SYMBOL  7a.  NAME  OF  MONITORING  ORGANIZATION 
(If  applicable) 


AFIT/ENS 


7b.  AOORESS  (City.  State  and  ZIP  Coda) 


Air  Force  Institute  of  Technology 
Wright-Patterson  AFB  OH  43433-6583 


.  NAME  OF  FUNOING/SPONSORING 
ORGANIZATION 


8b.  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
(If  app  Ucabte) 


Be.  AOORESS  (City.  State  and  ZIP  Code) 


11.  TITLE  (Include  Security  Claseificotionl 

See  Box  19 


PERSONAL  AUTHOR(S) 


Jamaa  Ee  For tar  III,  Captain,  USAF 


13a.  TYPE  OF  REPORT 

MS  Thesis 


IB.  SUPPLEMENTARY  NOTATION 


13b.  TIME  COVERED 
FROM _ 


10.  SOURCE  OF  FUNDING  NOS. 

PROGRAM  1 

PROJECT 

TASK 

WORK  UNIT 

ELEMENT  NO. 

NO. 

NO. 

NO. 

14.  DATE  OF  REPORT  (Yr..  Mo..  Day ) 

1985  December 


COSATI  COOES 


SUB.  GR. 


18.  SUBJECT  TERMS  (Continue  on  reverte  if  necessary  and  identify  by  block  number) 

Monte  Carlo  Method;  Statistical  Functions;  Probability 
Distribution  Function;  Statistical  Analysis;  Statistical 
Decision  Theory; 


19.  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  block  number ) 


FIELD 

GROUP 

12 

01 

irr 


[•T7 


TITLEi  MODIFIED  K0LM090R0V-SMIRN0V ,  RHDERSOH-DRRL I NO ,  RHD  CRRHER-UON  RISES  TESTS 
FOR  THE  PR  RE  TO  DISTRIBUTION  HITH  UHKNOHN  L0CRT10H  RHD  SCRLE  PRRRHE TERS 

$pr°Y„d  loX  pub,|e  falaoaa:  IAW  AFB  |90-l/ 

^trmTT'  woCaveb^  /(>  UMJ  ft, 

£°V"  ”‘,,,<nch  and  ^'•Wonai  Davalopiaaait 

ot  Technology 

THESIS  ADVISOR;  Dr  Albert  H.  Moore  *"  °H  44413 

Professor  of  Mathematics 


22a.  NAME  OF  RESPONSIBLE  INDIVIDUAL 

Fro-f.  Albart  H.  Moor  a 


21.  ABSTRACT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


22b  TELEPHONE  NUMBER  22c.  OFFICE  SYMBOL 

(Include  Area  Code ) 

(513)255-3098  AFIT/ENC 


DO  FORM  1473,  83  APR 


EOITION  OF  1  JAN  73  IS  OBSOLETE. 


SECURITY  CLASSIFICATION  OF  THIS  FAOE 


19.  ABSTRACT 

Modified  Kolmogorov-Smirnov  (K-S),  Anderson-Darl ing  (A-D),  and  Cramer-v on  Mises 
(C-VM)  critical  values  are  generated  far  the  three-parameter  Pareto  distribution.  The 
values  may  be  used  to  test  whether  a  set  of  observations  follows  a  Pareto  distribution 
when  the  location  and  scale  parameters  are  unspecified  and  thus  must  be  estimated  from 
the  sample.  A  Monte  Carlo  simulation  of  5000  repetitions  is  used  to  generate  critical 
values  for  sample  sizes  5(5)30  (i.e.,  5  to  30  in  increments  of  5)  and  Pareto  shape 
parameters  .5 (.5) 4.0. 

A  5000-repetition  Monte  Carlo  investigation  is  carried  ouit  by  using  5,  15,  and  25 
observations  from  eight  alternate  distributions  to  compare  the  powers  of  the  K-S,  A-D, 
C-VM,  and  Chi-square  tests.  The  power  values  of  the  tests  are  relatively  low  for  a 
sample  size  of  five.  However,  the  powers  of  the  modified  K-S,  A-D,  and  C-VM  tests  are 
considerably  better  than  the  Chi-square  test  at  larger  sample  sizes.  Next  to  the 
Chi-square  test,  the  A-D  test  has  the  lowest  power  in  most  cases. 

A  functional  relationship  is  identified  between  the  modified  K-S  and  C-VM  test 
statistics  and  the  Pareto  shape  parameter.  The  critical  values  are  found  to  be  a  linear 
function  of  the  shape  parameters  between  1.5  and  4.0. 


SECURITY  CLASSIFICATION  OF  THIS  FAOE 


